Knowledge graph extraction sounds like one of those tasks language models were born for. Feed in a paragraph of unstructured text, ask for entities and the relationships between them, and receive clean triples ready to load into a graph database. The demos are convincing. The reality, once you run the same prompt across ten thousand documents, is messier than the demos suggest.
Most of the trouble comes from beliefs that are half true. A technique works on the examples someone tested, so it gets written up as a rule. Then it travels through blog posts and conference talks until it hardens into received wisdom. By the time you inherit the advice, the conditions that made it work have been stripped away. The result is a pipeline that looks reasonable and produces a graph nobody trusts.
This article walks through the most common misconceptions about prompting for knowledge graph extraction, explains why each one feels right, and describes what actually happens when you push the technique past a handful of curated examples.
Myth: A Bigger Model Solves Extraction Quality
Why the belief takes hold
When extraction is sloppy, the first instinct is to reach for a more capable model. Sometimes that helps. A stronger model does recognize more entity types and handles ambiguous phrasing better. So people conclude that quality is mostly a function of model size and stop investigating.
What actually happens
The dominant source of error in extraction is not reasoning capacity. It is schema ambiguity. If your prompt does not tell the model whether "Acme acquired Beta" should produce an acquired relationship, an owns relationship, or both, a bigger model will simply make a more confident guess. It will also make a different guess on the next document, which is worse than being consistently wrong.
The fix is rarely a model upgrade. It is a tighter specification of the target schema, a controlled vocabulary for relationship types, and a few worked examples that show the model how to handle the edge cases you care about. Once the schema is pinned down, the gap between mid-tier and top-tier models often shrinks to noise.
Myth: One Prompt Can Handle the Whole Document
The appeal of a single pass
Sending an entire document and asking for the complete graph in one response is tidy. It keeps the code simple and the token accounting easy to reason about. For short, well-structured text it works fine.
Where it breaks
Long documents overwhelm a single extraction pass in two ways. First, the model starts dropping entities that appear late in the text because attention spreads thin. Second, relationships that span distant sentences get missed entirely because the model never holds both ends in working memory at once.
A more reliable pattern decomposes the task. Extract entities first, resolve them to canonical forms, then run a second pass that asks only about relationships among the already-identified entities. This is slower and costs more tokens, but recall improves dramatically and the output becomes auditable. If you are weighing this trade-off, our piece on Building a Repeatable Workflow for Prompting for Knowledge Graph Extraction lays out the staging in detail.
Myth: Few-Shot Examples Always Improve Results
The reasonable starting point
Few-shot prompting genuinely helps the model understand your schema and formatting. Showing two or three examples of input text paired with the desired triples anchors the output format and reduces hallucinated relationship types.
The hidden cost
Examples also bias the model toward the patterns they contain. If all your examples involve corporate acquisitions, the model starts seeing acquisitions everywhere, even in documents about partnerships or supplier relationships. The examples that taught the format also taught a prior that distorts extraction on dissimilar text.
The corrective is to diversify examples across the relationship types you expect, and to periodically run the prompt with zero examples to see whether the few-shot set is helping or merely steering. Treat your example set as a tunable component, not a fixed asset.
Myth: JSON Output Means Reliable Output
Why structure feels like safety
Asking for JSON output, and validating it against a schema, catches a real class of errors. Malformed responses get rejected, and you avoid downstream parsing failures. It is a genuine improvement over free text.
What structure cannot guarantee
A perfectly valid JSON object can still contain a wrong relationship, a misattributed entity, or a confidently invented fact. Schema validation checks shape, not truth. Teams that lean on structured output sometimes stop checking content because the pipeline "passes," which is exactly when silent errors accumulate in the graph.
Pair structural validation with content checks: verify that extracted entities actually appear in the source text, that relationship directions are consistent, and that confidence below a threshold routes to human review. Our companion article on What People Get Wrong About Controlling Formality and Register in Output makes a parallel point about how surface compliance can mask real defects.
Myth: Temperature Zero Makes Extraction Deterministic
The intuitive expectation
Setting temperature to zero is supposed to make the model deterministic, which sounds ideal for extraction. Run the same document twice, get the same graph, and your pipeline is reproducible.
The uncomfortable reality
Temperature zero reduces variance but does not eliminate it across model versions, infrastructure changes, or even minor prompt reformatting. More importantly, low temperature makes the model commit hard to its first interpretation, which means a single ambiguous phrasing produces a single confident answer with no signal that other readings existed.
For extraction, a small amount of sampling combined with multiple passes can actually surface ambiguity. If three runs produce three different relationship labels, that disagreement is information: the document is genuinely ambiguous and deserves review. Determinism hides that signal.
Myth: Entity Resolution Is a Separate Problem
The clean mental model
It is tempting to treat extraction and entity resolution as distinct stages owned by different systems. The model extracts mentions; a downstream service decides that "IBM," "I.B.M.," and "International Business Machines" are the same node.
Why coupling matters
In practice, resolution decisions feed back into extraction quality. If the model knows that two mentions refer to the same entity, it produces cleaner relationships. If it does not, you get duplicate nodes with fragmented edges, and merging them later is lossy. The strongest pipelines give the model a running list of already-resolved entities and ask it to extend that list rather than start fresh each time, a technique discussed alongside other staging choices in The Prompting for Knowledge Graph Extraction Playbook.
Frequently Asked Questions
Do I need a fine-tuned model for knowledge graph extraction?
Usually not at first. A well-specified schema, a controlled relationship vocabulary, and a diverse few-shot set get most teams to a usable baseline with a general-purpose model. Fine-tuning earns its keep when you have a stable schema, a large volume of domain-specific text, and labeled data, at which point it improves consistency and lowers per-call cost. Reach for it after prompt engineering plateaus, not before.
How do I stop the model from inventing relationships?
Constrain it. Provide an explicit, closed list of allowed relationship types and instruct the model to use only those. Require that every extracted triple cite the span of source text supporting it, then verify that span exists. Hallucinations drop sharply when the model knows its output will be checked against the source.
Is a single end-to-end prompt ever the right choice?
Yes, for short, structurally simple documents where recall is less critical than speed. A single pass is cheaper and simpler to operate. The moment documents grow long or relationships span paragraphs, decomposition into entity and relationship stages pays for itself in recall and auditability.
Why does my extracted graph have so many duplicate nodes?
Almost always because entity resolution is happening too late or not at all. The model extracts surface mentions and treats each variant spelling as a new entity. Feeding the model a canonical entity list during extraction, or resolving aggressively immediately after, prevents fragmentation that is painful to repair downstream.
Should I validate output with a schema?
Always validate shape, but never stop there. Schema validation guarantees parseable, well-formed output and nothing about correctness. Layer content checks on top: span verification, direction consistency, and confidence thresholds that route uncertain extractions to human review.
Key Takeaways
- Most extraction failures come from schema ambiguity, not model weakness; specify your relationship vocabulary before reaching for a bigger model.
- Long documents need decomposed extraction (entities first, then relationships) to maintain recall and auditability.
- Few-shot examples both teach format and impose bias; diversify them and test against zero-shot periodically.
- Valid JSON proves structure, not truth; pair schema validation with span verification and confidence-based review.
- Treat entity resolution as coupled to extraction, not a separate downstream cleanup, to avoid fragmented graphs.