Graphs Rarely Fail on Tech, They Fail on Habits

Knowledge graph projects rarely fail because the technology doesn't work. They fail because of a small set of recurring mistakes that look reasonable in the moment and compound silently until the graph is unusable. Having seen these patterns play out repeatedly, I want to name each one precisely: what it is, why smart people make it, what it costs, and the specific corrective practice.

These aren't abstract warnings. Each mistake below has a clear failure signature you can check for in your own project right now. If you're building or maintaining a knowledge graph, read these as a diagnostic. If you're about to start one, read them as a map of where the cliffs are. The step-by-step guide shows the happy path; this article shows the ditches beside it.

Mistake 1: Modeling Your Data Instead of Your Questions

Why it happens: You have data, so you start by modeling what you have. It feels productive.

The cost: You build a sprawling graph that mirrors your source systems and answers no question well. Every node type from every source ends up represented, and the graph becomes as hard to query as the silos it was meant to unify.

The fix: Start with three to five concrete questions and model only what answers them. The question list is your scope guard. If a node or edge doesn't serve a question, cut it. This single discipline prevents more failures than any other.

Mistake 2: Skipping Entity Resolution

Why it happens: Loading raw records is fast and visible. Deduplicating them is slow and invisible.

The cost: "Acme Corp," "Acme Corporation," and "ACME INC" become three separate nodes. Now every query that touches Acme returns partial answers, and you don't notice because the query runs — it just lies. This is the single most damaging mistake because it silently corrupts correctness.

The fix: Treat entity resolution as a first-class step, not cleanup. Define matching rules (exact, fuzzy, or model-assisted), run them on ingest, and spot-check merges. Budget real time here. Our best practices article expands on resolution strategy.

Mistake 3: Over-Engineering the Ontology Up Front

Why it happens: Ontologies feel rigorous, and a thorough one seems professional.

The cost: You spend weeks formalizing class hierarchies and inference rules before answering a single question. The ontology becomes a project unto itself, and by the time it's "done," requirements have changed.

The fix: Start with the lightest schema that works — a handful of node and edge types. Add ontological structure (inheritance, constraints, inference) only when a real question requires it. Formal ontologies earn their cost in regulated, knowledge-heavy domains; elsewhere they're often premature.

Mistake 4: Edge-Type Sprawl

Why it happens: Every new requirement seems to need a new relationship type, so the count creeps: WORKS_FOR, EMPLOYED_BY, STAFFS, CONTRACTS_WITH.

The cost: Queries break because the same real-world relationship is split across synonyms. Nobody can remember which edge type to use, and traversals miss data stored under a sibling type.

The fix: Maintain a controlled vocabulary of edge types and review additions. Before adding an edge type, ask whether an existing one plus a property would do. Often WORKS_FOR with a type property beats four near-identical edges.

Mistake 5: Treating the Graph as Write-Once

Why it happens: The initial build is hard, so once it works, teams stop touching the model and just append data.

The cost: The world changes, new questions arrive, and the frozen model can't answer them. Workarounds pile up — properties stuffed with JSON, fake nodes — until the graph is a museum of past decisions.

The fix: Treat the model as living. Schedule periodic reviews where you add, rename, or retire node and edge types deliberately. A graph that doesn't evolve with its questions slowly becomes irrelevant.

Mistake 6: Ignoring Provenance

Why it happens: You're focused on the facts themselves, not where they came from.

The cost: When a fact turns out to be wrong, you can't tell which source introduced it or which other facts share that source's reliability. You also can't satisfy audit or compliance requirements. For AI grounding, this is fatal — you can't cite a source you didn't record.

The fix: Attach provenance properties — source, ingest date, confidence — to nodes and edges from day one. Retrofitting provenance is far harder than capturing it on ingest. This matters most when the graph feeds an LLM, as covered in the complete guide.

Mistake 7: Using a Graph When a Table Would Do

Why it happens: Graphs are exciting, and "we have a knowledge graph" sounds impressive.

The cost: You take on a graph's operational complexity — unfamiliar query language, specialized storage, harder aggregations — for a problem that was mostly lists and totals. The team struggles, and the graph delivers less than the relational system it replaced.

The fix: Apply the relationship test honestly. If your highest-value questions traverse three or more relationships, a graph earns its keep. If they're "sum this column where that condition holds," stay relational. Graphs and warehouses coexist; you don't have to choose one religion. We weigh this trade-off with concrete cases in real-world examples.

A Quick Self-Diagnosis

You can check your own project against all seven in about fifteen minutes. Run this short audit:

Pull your written question list. No list? You're at risk of Mistake 1.
Count the nodes for your single most important entity. More than one? Mistake 2 is live.
Count your node and edge types. Can you explain each from memory? If not, suspect Mistakes 3 and 4.
Pick a random edge. Can you say which source produced it and when? If not, Mistake 6.
Ask whether your top three questions traverse multiple relationships. If they're sums and averages, reconsider Mistake 7.

Most struggling graphs fail at least two of these checks. The value of running it is that each failed check points to a specific, named corrective practice rather than a vague sense that "the graph isn't working." Diagnosis precedes repair.

How These Mistakes Compound

The dangerous part is that these errors interact. Skipping entity resolution (Mistake 2) makes the graph give wrong answers; not recording provenance (Mistake 6) means you can't trace those wrong answers to a source; treating the graph as write-once (Mistake 5) means nobody fixes the model that produced them. A project can carry all three for a year before anyone realizes the graph has been quietly wrong the whole time. Catching one mistake early often prevents the cascade.

Frequently Asked Questions

Which mistake is the most common?

Modeling data instead of questions (Mistake 1) is the most common starting error, and skipping entity resolution (Mistake 2) is the most damaging to correctness. They often appear together: a team that didn't define questions also didn't think carefully about what counts as a distinct entity.

How do I know if my graph has an entity resolution problem?

Pick an important entity you know well and count its nodes. If "your biggest customer" appears as more than one node, you have a resolution problem, and it's almost certainly widespread. Run this check on five key entities — it takes minutes and reveals the scale of the issue.

Is a formal ontology ever worth building up front?

Yes, in domains where meaning is contested and regulated — healthcare, law, finance — and where shared vocabularies already exist to adopt. There, the ontology prevents costly ambiguity. Outside those domains, start light and add structure only when a real question forces it.

Can AI tools cause these mistakes?

They can amplify Mistakes 2 and 4. An LLM extracting entities from text will happily create duplicate nodes and invent near-synonym edge types unless you constrain it. AI accelerates building, which means it accelerates building badly if you haven't set the guardrails first.

What's the cheapest mistake to fix?

Edge-type sprawl (Mistake 4), if caught early — you consolidate synonyms and add a controlled vocabulary. The expensive ones are entity resolution and missing provenance, because both require reprocessing historical data. Prevention is dramatically cheaper than remediation for those two.

Key Takeaways

Model your questions, not your data — a question list is your scope guard.
Entity resolution is the highest-stakes step; skipping it silently corrupts every answer.
Start with a light schema and add ontological rigor only when a real question demands it.
Control your edge-type vocabulary and treat the model as living, not write-once.
Record provenance from day one and reserve graphs for genuinely relationship-heavy problems.

Mistake 1: Modeling Your Data Instead of Your Questions

Why it happens: You have data, so you start by modeling what you have. It feels productive.

Mistake 2: Skipping Entity Resolution

Why it happens: Loading raw records is fast and visible. Deduplicating them is slow and invisible.

Mistake 3: Over-Engineering the Ontology Up Front

Why it happens: Ontologies feel rigorous, and a thorough one seems professional.

Mistake 4: Edge-Type Sprawl

Why it happens: Every new requirement seems to need a new relationship type, so the count creeps: WORKS_FOR, EMPLOYED_BY, STAFFS, CONTRACTS_WITH.

The cost: Queries break because the same real-world relationship is split across synonyms. Nobody can remember which edge type to use, and traversals miss data stored under a sibling type.

Mistake 5: Treating the Graph as Write-Once

Why it happens: The initial build is hard, so once it works, teams stop touching the model and just append data.

Mistake 6: Ignoring Provenance

Why it happens: You're focused on the facts themselves, not where they came from.

Mistake 7: Using a Graph When a Table Would Do

Why it happens: Graphs are exciting, and "we have a knowledge graph" sounds impressive.

A Quick Self-Diagnosis

You can check your own project against all seven in about fifteen minutes. Run this short audit:

Pull your written question list. No list? You're at risk of Mistake 1.
Count the nodes for your single most important entity. More than one? Mistake 2 is live.
Count your node and edge types. Can you explain each from memory? If not, suspect Mistakes 3 and 4.
Pick a random edge. Can you say which source produced it and when? If not, Mistake 6.
Ask whether your top three questions traverse multiple relationships. If they're sums and averages, reconsider Mistake 7.

How These Mistakes Compound

Frequently Asked Questions

Which mistake is the most common?

How do I know if my graph has an entity resolution problem?

Is a formal ontology ever worth building up front?

Can AI tools cause these mistakes?

What's the cheapest mistake to fix?

Key Takeaways

Model your questions, not your data — a question list is your scope guard.
Entity resolution is the highest-stakes step; skipping it silently corrupts every answer.
Start with a light schema and add ontological rigor only when a real question demands it.
Control your edge-type vocabulary and treat the model as living, not write-once.
Record provenance from day one and reserve graphs for genuinely relationship-heavy problems.

Graphs Rarely Fail on Tech, They Fail on Habits

Mistake 1: Modeling Your Data Instead of Your Questions

Mistake 2: Skipping Entity Resolution

Mistake 3: Over-Engineering the Ontology Up Front

Mistake 4: Edge-Type Sprawl

Mistake 5: Treating the Graph as Write-Once

Mistake 6: Ignoring Provenance

Mistake 7: Using a Graph When a Table Would Do

A Quick Self-Diagnosis

How These Mistakes Compound

Frequently Asked Questions

Which mistake is the most common?

How do I know if my graph has an entity resolution problem?

Is a formal ontology ever worth building up front?

Can AI tools cause these mistakes?

What's the cheapest mistake to fix?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Graphs Rarely Fail on Tech, They Fail on Habits

Mistake 1: Modeling Your Data Instead of Your Questions

Mistake 2: Skipping Entity Resolution

Mistake 3: Over-Engineering the Ontology Up Front

Mistake 4: Edge-Type Sprawl

Mistake 5: Treating the Graph as Write-Once

Mistake 6: Ignoring Provenance

Mistake 7: Using a Graph When a Table Would Do

A Quick Self-Diagnosis

How These Mistakes Compound

Frequently Asked Questions

Which mistake is the most common?

How do I know if my graph has an entity resolution problem?

Is a formal ontology ever worth building up front?

Can AI tools cause these mistakes?

What's the cheapest mistake to fix?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?