AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Mistake 1: Modeling Your Data Instead of Your QuestionsMistake 2: Skipping Entity ResolutionMistake 3: Over-Engineering the Ontology Up FrontMistake 4: Edge-Type SprawlMistake 5: Treating the Graph as Write-OnceMistake 6: Ignoring ProvenanceMistake 7: Using a Graph When a Table Would DoA Quick Self-DiagnosisHow These Mistakes CompoundFrequently Asked QuestionsWhich mistake is the most common?How do I know if my graph has an entity resolution problem?Is a formal ontology ever worth building up front?Can AI tools cause these mistakes?What's the cheapest mistake to fix?Key Takeaways
Home/Blog/Graphs Rarely Fail on Tech, They Fail on Habits
General

Graphs Rarely Fail on Tech, They Fail on Habits

A

Agency Script Editorial

Editorial Team

·June 22, 2025·7 min read
what is a knowledge graphwhat is a knowledge graph common mistakeswhat is a knowledge graph guideai fundamentals

Knowledge graph projects rarely fail because the technology doesn't work. They fail because of a small set of recurring mistakes that look reasonable in the moment and compound silently until the graph is unusable. Having seen these patterns play out repeatedly, I want to name each one precisely: what it is, why smart people make it, what it costs, and the specific corrective practice.

These aren't abstract warnings. Each mistake below has a clear failure signature you can check for in your own project right now. If you're building or maintaining a knowledge graph, read these as a diagnostic. If you're about to start one, read them as a map of where the cliffs are. The step-by-step guide shows the happy path; this article shows the ditches beside it.

Mistake 1: Modeling Your Data Instead of Your Questions

Why it happens: You have data, so you start by modeling what you have. It feels productive.

The cost: You build a sprawling graph that mirrors your source systems and answers no question well. Every node type from every source ends up represented, and the graph becomes as hard to query as the silos it was meant to unify.

The fix: Start with three to five concrete questions and model only what answers them. The question list is your scope guard. If a node or edge doesn't serve a question, cut it. This single discipline prevents more failures than any other.

Mistake 2: Skipping Entity Resolution

Why it happens: Loading raw records is fast and visible. Deduplicating them is slow and invisible.

The cost: "Acme Corp," "Acme Corporation," and "ACME INC" become three separate nodes. Now every query that touches Acme returns partial answers, and you don't notice because the query runs — it just lies. This is the single most damaging mistake because it silently corrupts correctness.

The fix: Treat entity resolution as a first-class step, not cleanup. Define matching rules (exact, fuzzy, or model-assisted), run them on ingest, and spot-check merges. Budget real time here. Our best practices article expands on resolution strategy.

Mistake 3: Over-Engineering the Ontology Up Front

Why it happens: Ontologies feel rigorous, and a thorough one seems professional.

The cost: You spend weeks formalizing class hierarchies and inference rules before answering a single question. The ontology becomes a project unto itself, and by the time it's "done," requirements have changed.

The fix: Start with the lightest schema that works — a handful of node and edge types. Add ontological structure (inheritance, constraints, inference) only when a real question requires it. Formal ontologies earn their cost in regulated, knowledge-heavy domains; elsewhere they're often premature.

Mistake 4: Edge-Type Sprawl

Why it happens: Every new requirement seems to need a new relationship type, so the count creeps: WORKS_FOR, EMPLOYED_BY, STAFFS, CONTRACTS_WITH.

The cost: Queries break because the same real-world relationship is split across synonyms. Nobody can remember which edge type to use, and traversals miss data stored under a sibling type.

The fix: Maintain a controlled vocabulary of edge types and review additions. Before adding an edge type, ask whether an existing one plus a property would do. Often WORKS_FOR with a type property beats four near-identical edges.

Mistake 5: Treating the Graph as Write-Once

Why it happens: The initial build is hard, so once it works, teams stop touching the model and just append data.

The cost: The world changes, new questions arrive, and the frozen model can't answer them. Workarounds pile up — properties stuffed with JSON, fake nodes — until the graph is a museum of past decisions.

The fix: Treat the model as living. Schedule periodic reviews where you add, rename, or retire node and edge types deliberately. A graph that doesn't evolve with its questions slowly becomes irrelevant.

Mistake 6: Ignoring Provenance

Why it happens: You're focused on the facts themselves, not where they came from.

The cost: When a fact turns out to be wrong, you can't tell which source introduced it or which other facts share that source's reliability. You also can't satisfy audit or compliance requirements. For AI grounding, this is fatal — you can't cite a source you didn't record.

The fix: Attach provenance properties — source, ingest date, confidence — to nodes and edges from day one. Retrofitting provenance is far harder than capturing it on ingest. This matters most when the graph feeds an LLM, as covered in the complete guide.

Mistake 7: Using a Graph When a Table Would Do

Why it happens: Graphs are exciting, and "we have a knowledge graph" sounds impressive.

The cost: You take on a graph's operational complexity — unfamiliar query language, specialized storage, harder aggregations — for a problem that was mostly lists and totals. The team struggles, and the graph delivers less than the relational system it replaced.

The fix: Apply the relationship test honestly. If your highest-value questions traverse three or more relationships, a graph earns its keep. If they're "sum this column where that condition holds," stay relational. Graphs and warehouses coexist; you don't have to choose one religion. We weigh this trade-off with concrete cases in real-world examples.

A Quick Self-Diagnosis

You can check your own project against all seven in about fifteen minutes. Run this short audit:

  • Pull your written question list. No list? You're at risk of Mistake 1.
  • Count the nodes for your single most important entity. More than one? Mistake 2 is live.
  • Count your node and edge types. Can you explain each from memory? If not, suspect Mistakes 3 and 4.
  • Pick a random edge. Can you say which source produced it and when? If not, Mistake 6.
  • Ask whether your top three questions traverse multiple relationships. If they're sums and averages, reconsider Mistake 7.

Most struggling graphs fail at least two of these checks. The value of running it is that each failed check points to a specific, named corrective practice rather than a vague sense that "the graph isn't working." Diagnosis precedes repair.

How These Mistakes Compound

The dangerous part is that these errors interact. Skipping entity resolution (Mistake 2) makes the graph give wrong answers; not recording provenance (Mistake 6) means you can't trace those wrong answers to a source; treating the graph as write-once (Mistake 5) means nobody fixes the model that produced them. A project can carry all three for a year before anyone realizes the graph has been quietly wrong the whole time. Catching one mistake early often prevents the cascade.

Frequently Asked Questions

Which mistake is the most common?

Modeling data instead of questions (Mistake 1) is the most common starting error, and skipping entity resolution (Mistake 2) is the most damaging to correctness. They often appear together: a team that didn't define questions also didn't think carefully about what counts as a distinct entity.

How do I know if my graph has an entity resolution problem?

Pick an important entity you know well and count its nodes. If "your biggest customer" appears as more than one node, you have a resolution problem, and it's almost certainly widespread. Run this check on five key entities — it takes minutes and reveals the scale of the issue.

Is a formal ontology ever worth building up front?

Yes, in domains where meaning is contested and regulated — healthcare, law, finance — and where shared vocabularies already exist to adopt. There, the ontology prevents costly ambiguity. Outside those domains, start light and add structure only when a real question forces it.

Can AI tools cause these mistakes?

They can amplify Mistakes 2 and 4. An LLM extracting entities from text will happily create duplicate nodes and invent near-synonym edge types unless you constrain it. AI accelerates building, which means it accelerates building badly if you haven't set the guardrails first.

What's the cheapest mistake to fix?

Edge-type sprawl (Mistake 4), if caught early — you consolidate synonyms and add a controlled vocabulary. The expensive ones are entity resolution and missing provenance, because both require reprocessing historical data. Prevention is dramatically cheaper than remediation for those two.

Key Takeaways

  • Model your questions, not your data — a question list is your scope guard.
  • Entity resolution is the highest-stakes step; skipping it silently corrupts every answer.
  • Start with a light schema and add ontological rigor only when a real question demands it.
  • Control your edge-type vocabulary and treat the model as living, not write-once.
  • Record provenance from day one and reserve graphs for genuinely relationship-heavy problems.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification