Most teams that want a knowledge graph already have the raw material: contracts, support tickets, research notes, product specs, transcripts. What they lack is a dependable way to convert that prose into structured entities and relationships. Prompt-driven extraction has made that conversion dramatically cheaper than the old pipeline of named-entity recognizers, relation classifiers, and hand-tuned rules. But the prompt is only one piece. Around it sits a tooling layer that decides whether your extraction is reproducible, auditable, and cheap enough to run at scale.
The tooling landscape is younger than the problem, so the categories blur together and vendors describe the same capability with five different names. This survey organizes the space into the functions that actually matter, walks through the criteria that should drive a purchase or build decision, and gives you a way to reason about trade-offs before you commit to a stack you will be maintaining for years.
The goal is not to crown a single winner. It is to help you recognize which category each tool belongs to, what it is genuinely good at, and where you will still have to write your own glue code.
The Functional Layers You Are Actually Buying
A knowledge graph extraction system is rarely one product. It is a chain of responsibilities, and most tools cover only a slice of that chain.
Orchestration and prompt management
This layer holds your prompts, versions them, and runs them against a model provider. Frameworks here let you define the extraction schema once and reuse it across documents. The valuable feature is not the abstraction itself but versioning: when you change a prompt, you want to know which graph nodes were produced by which prompt revision.
Structured-output enforcement
The model will happily return prose when you asked for JSON. Tools in this category constrain the output to a schema using function calling, JSON mode, or grammar-constrained decoding. This is the single biggest reliability lever in the whole stack, and it is worth choosing a model and library that support it natively rather than bolting on a regex parser afterward.
Graph storage and identity resolution
Extracted triples have to land somewhere. A property graph database or an RDF store gives you query power, but the harder job is entity resolution: deciding that "Acme Corp," "Acme Corporation," and "ACME" are the same node. Some tools fold this in; most leave it to you.
Selection Criteria That Actually Discriminate
Vendor feature lists are noise. A short set of criteria does most of the discriminating.
- Schema fidelity. Can the tool guarantee the output matches your ontology, or does it merely suggest a shape? Guaranteed beats suggested every time you run at volume.
- Provenance tracking. Every edge in a serious graph needs to point back to the source span that produced it. Tools without provenance leave you unable to debug or defend your data.
- Cost transparency. Token-heavy extraction gets expensive fast. You want a tool that surfaces per-document cost, not one that hides it inside a monthly bill.
- Human-in-the-loop hooks. The best systems make it trivial to route low-confidence extractions to a reviewer. If correction is painful, your graph quality stays low.
Build Versus Buy in Practice
The honest answer is that almost everyone does both. You buy the storage layer and the model access, and you build the extraction logic that encodes your domain.
When buying wins
If your ontology is generic and your volume is moderate, a managed extraction service or an off-the-shelf framework will get you to a working graph in days. Paying for someone else's structured-output reliability is usually a bargain.
When building wins
Domain-specific ontologies, strict provenance requirements, and high volume push you toward building. The custom prompt and validation logic become a genuine competitive asset, and you do not want them locked inside a vendor's black box. Teams weighing this decision will find the long-form treatment in When Strict Schemas Beat Open-Ended Graph Extraction useful for framing the architectural fork.
Evaluating Open-Source Frameworks
Open-source options dominate the prompt-orchestration and structured-output layers. They give you control and avoid lock-in, at the cost of maintenance.
What to inspect before adopting
- The schema definition format and whether it supports nested entities and typed relationships.
- Whether retries on malformed output are automatic and bounded.
- The quality of the validation layer that sits between the model and your store.
- Community momentum, because an abandoned extraction library becomes a liability the moment a model provider changes its API.
A framework that handles validation poorly will silently corrupt your graph, and corruption in a graph compounds because downstream queries assume the data is clean.
Managed Services and Their Boundaries
Hosted extraction services promise to remove the plumbing. They genuinely do for common cases. The boundary to watch is customization: the moment your ontology diverges from what the service expects, you hit a wall.
Questions to ask a vendor
- Can I supply my own ontology, or must I map to yours?
- Do you return source spans for every extracted relationship?
- How do you handle documents longer than the model context window?
- What happens to my data, and can I export the full graph?
The export question matters more than it looks. A graph you cannot fully export is a graph you do not really own.
Assembling a Stack That Holds Up
A durable stack pairs a structured-output-capable model with a versioned prompt layer, a validation gate, and a graph store with provenance. Each piece is replaceable, which is the point: you want to swap the model when a better one ships without rewriting your ontology.
The teams that succeed treat extraction as a pipeline with measurable quality, not a one-off script. That mindset connects directly to Scoring Whether Your Extracted Triples Are Actually Right, because tooling choices only pay off if you can measure their effect.
Plan for the pieces no tool provides
Whatever stack you assemble, a few responsibilities will fall to you regardless of vendor promises: encoding your specific domain into an ontology, deciding your quality thresholds, and routing low-confidence output to review. Treat these as first-class parts of the system rather than afterthoughts. The teams that get burned are usually the ones that assumed a tool would handle the domain logic, then discovered too late that the hardest, most valuable work was always going to be theirs to own.
Frequently Asked Questions
Do I need a graph database, or can I use a relational store?
You can start with a relational store, and many teams do. A graph database earns its place once your queries traverse multiple hops, such as finding all suppliers connected to a customer through intermediate contracts. If your access patterns stay shallow, the operational overhead of a graph store may not be worth it yet.
Is a dedicated extraction framework better than calling the model directly?
For a prototype, direct calls are fine. For anything you maintain, a framework pays for itself by handling schema enforcement, retries, and versioning. The risk of going framework-free is that you reinvent all of that, badly, under deadline pressure.
How important is structured-output enforcement really?
It is the difference between a system that works and one that works until it does not. Without enforcement, a small fraction of outputs will be malformed, and at scale that small fraction becomes thousands of corrupt edges. Treat it as non-negotiable.
Can one tool cover the entire pipeline?
A few claim to, and for generic ontologies they come close. The closer you get to a specialized domain, the more you will assemble best-of-breed pieces yourself. Plan for a stack, not a single product.
How do I avoid vendor lock-in?
Keep your ontology and prompts in your own repository, and choose tools that export the full graph in a standard format. If switching vendors means re-extracting from scratch, you are locked in regardless of what the contract says.
Key Takeaways
- Knowledge graph extraction is a pipeline of orchestration, structured-output enforcement, storage, and identity resolution, not a single product.
- Schema fidelity, provenance tracking, cost transparency, and human-in-the-loop hooks are the criteria that actually separate tools.
- Buy the generic layers and build the domain-specific extraction logic; nearly everyone ends up doing both.
- Structured-output enforcement is the highest-leverage reliability decision in the entire stack.
- Protect yourself from lock-in by keeping your ontology and prompts portable and insisting on full graph export.