AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

The Two Poles of Extraction StrategyWhy closed schemas feel safeWhy open extraction feels powerfulThe Axes That Actually MatterHybrid Approaches and Why They Win in PracticeSeeded open extractionTwo-stage pipelinesHow Model Choice Interacts With the Trade-offStructured-output supportContext window and document lengthThe Decision RuleRevisit the decision as the project maturesFrequently Asked QuestionsIs open extraction ever the right default?Can I migrate from open to closed later?Does a hybrid approach double my cost?How do I know my schema is too restrictive?Which approach is cheaper to maintain?Key Takeaways
Home/Blog/When Strict Schemas Beat Open-Ended Graph Extraction
General

When Strict Schemas Beat Open-Ended Graph Extraction

A

Agency Script Editorial

Editorial Team

·October 13, 2019·8 min read
prompting for knowledge graph extractionprompting for knowledge graph extraction tradeoffsprompting for knowledge graph extraction guideprompt engineering

Every team that builds a knowledge graph from text eventually hits the same fork in the road. Do you tell the model exactly which entity types and relationship types it is allowed to produce, or do you let it discover the structure of the document and report whatever it finds? The first path gives you a clean, queryable, predictable graph. The second path captures nuance you never anticipated. You cannot maximize both at once, and pretending you can is how projects drift into expensive rework.

This is not a question with a universal answer. It is a set of trade-offs whose right resolution depends on your domain, your tolerance for noise, and what you plan to do with the graph downstream. A graph feeding a strict compliance query needs different guarantees than a graph supporting exploratory research. The mistake is choosing by default rather than by analysis.

What follows lays out the competing approaches honestly, names the axes along which they differ, and ends with a decision rule you can actually apply. The aim is to make the trade-off visible so you choose it deliberately instead of inheriting it from whichever tutorial you happened to read first.

It helps to remember that this trade-off is not unique to graph extraction. It is the same tension between structure and flexibility that runs through database schema design, taxonomy work, and any effort to impose order on messy information. The reason it feels especially sharp here is that language models make both extremes cheap to attempt, so the constraint that used to come from implementation difficulty now has to come from your own judgment. That shift puts the burden of discipline squarely on the designer.

The Two Poles of Extraction Strategy

At one end sits closed-schema extraction. You define an ontology up front: entity types, relationship types, attribute constraints. The model fills in that template and nothing else.

At the other end sits open extraction, sometimes called open information extraction. The model returns subject-predicate-object triples in whatever vocabulary the text suggests, and you discover the schema after the fact by clustering what came back.

Why closed schemas feel safe

A closed schema means every node and edge is one of a known set of types. Queries are predictable. Validation is mechanical. Downstream consumers can build against a stable contract. The cost is that anything outside the schema is invisible, and writing a complete schema for a rich domain is genuinely hard.

Why open extraction feels powerful

Open extraction surfaces relationships you would never have thought to encode. It adapts to documents whose structure you do not fully understand yet. The cost is noise: inconsistent predicates, near-duplicate relationships, and a normalization burden that lands on you after extraction rather than before.

The Axes That Actually Matter

The closed-versus-open framing is too coarse. Real decisions turn on a handful of independent axes.

  • Schema stability. Does your domain have a settled vocabulary, or are you still learning what matters? Settled vocabularies favor closed schemas.
  • Downstream consumer tolerance. Will the graph feed an automated system that breaks on surprises, or a human who can interpret messiness? Automated consumers favor closed.
  • Recall sensitivity. Is missing a true relationship costly, or merely annoying? High recall sensitivity favors open extraction with later filtering.
  • Normalization budget. How much engineering can you spend cleaning up after the model? Low budgets favor closed schemas that produce clean output natively.

These axes are independent, which is why the decision resists a one-line rule until you weigh them together.

Hybrid Approaches and Why They Win in Practice

Most mature systems land in the middle. They run a closed schema for the entities and relationships they care about most, and an open pass to surface candidates for schema expansion.

Seeded open extraction

You provide the model a partial ontology and explicitly invite it to propose new types when the text demands. You get the predictability of a closed schema with a release valve for genuine novelty. The proposed types feed a review queue rather than the production graph directly.

Two-stage pipelines

The first stage extracts liberally. The second stage maps the liberal output onto your canonical ontology, dropping or flagging anything that will not map. This separates recall from precision, letting you tune each independently. The same separation underpins good evaluation, which is why this pairs naturally with the practices in Scoring Whether Your Extracted Triples Are Actually Right.

How Model Choice Interacts With the Trade-off

The approach you can afford depends partly on the model. Stronger models follow a closed schema more faithfully and propose better open relationships, which widens your options.

Structured-output support

If your model and tooling support grammar-constrained or function-call output, closed-schema extraction becomes nearly free to enforce. Without that support, closed schemas leak, and the trade-off shifts because you now pay a validation tax either way.

Context window and document length

Long documents strain both approaches. A model that loses track of entities across a long context will produce inconsistent identity regardless of schema strategy. This interacts with the deeper handling problems covered in Coreference, Long Context, and Other Graph Extraction Hard Parts.

The Decision Rule

Here is a rule you can apply in a meeting.

Default to a closed schema. Open extraction is seductive but its normalization cost is routinely underestimated, and a clean partial graph beats a messy complete one for almost every production use. Add a seeded open pass only when two conditions hold: your domain is still revealing new relationship types, and you have the review capacity to triage proposals without flooding the graph.

If you are feeding an automated downstream system, never let open output reach production unmapped. If a human is the consumer and exploration is the goal, relax the constraint and accept the noise as the price of discovery. Choosing the right tools to implement either path is the subject of Software That Turns Messy Text Into Clean Triples.

Revisit the decision as the project matures

The right answer early in a project is rarely the right answer later. A graph that began as exploratory, with open extraction surfacing the domain's vocabulary, should harden into a closed schema once that vocabulary stabilizes. A graph that began closed may need an occasional open audit to catch relationships the original ontology never anticipated. Treat the closed-versus-open choice as a setting you revisit at each phase, not a decision you make once and inherit forever. The teams that get into trouble are usually the ones that locked in an early choice and never asked whether their project had outgrown it.

Frequently Asked Questions

Is open extraction ever the right default?

Rarely. It is the right default only when you genuinely do not know your domain's vocabulary and the graph's purpose is discovery rather than reliable query. For most production systems, the normalization burden makes a closed schema the better starting point.

Can I migrate from open to closed later?

Yes, and many teams do. They run open extraction to learn the domain, cluster the results into a candidate ontology, then lock that ontology into a closed schema. The open phase becomes a research step, not the production design.

Does a hybrid approach double my cost?

Not usually, because the open pass can run on a sample rather than every document. You use open extraction to find new types occasionally and closed extraction for the bulk of throughput, which keeps cost close to the closed-only baseline.

How do I know my schema is too restrictive?

Watch what the model wants to say but cannot. If a high fraction of documents contain relationships your schema cannot express, your recall is suffering silently. A periodic open audit pass reveals what the closed schema is missing.

Which approach is cheaper to maintain?

Closed schemas are cheaper to maintain because the contract is stable and validation is mechanical. Open extraction shifts cost from design time to perpetual normalization, which compounds as your graph grows.

Key Takeaways

  • The core trade-off is closed-schema predictability versus open-extraction coverage, and you cannot maximize both.
  • Decide along independent axes: schema stability, consumer tolerance, recall sensitivity, and normalization budget.
  • Hybrid and two-stage pipelines capture most of the upside by separating liberal extraction from canonical mapping.
  • Model strength and structured-output support change which approach you can afford to enforce.
  • Default to a closed schema and add open extraction only when your domain is still evolving and you have review capacity.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification