There is no shortage of generic advice about reducing hallucinations. "Be specific." "Provide context." Most of it is true and useless, because it does not tell you what to actually do or why it works. This article takes the opposite stance: a set of opinionated practices, each with the reasoning that earns it a place and the trade-off it carries.
These are not rules to follow blindly. They are positions arrived at by watching prompts fail in production and figuring out what reliably fixed them. Where a practice has a downside, we say so, because a practice you apply without understanding its cost is a practice you will misapply.
If you want the foundational concepts first, start with Stop Your Model From Inventing Facts at the Prompt Layer. If you already know the basics, read on.
Default to Grounding, Treat Memory as a Last Resort
The strongest practice is to assume the model's memory is unreliable for any specific fact and design around that assumption.
Why this is the right default
Parametric memory is lossy, dated, and confidently wrong on specifics. Every factual answer drawn from memory is a guess wearing the costume of a fact. Grounding the model in supplied text converts the guess into a lookup.
The trade-off
Grounding requires you to have and supply the source material, which adds retrieval infrastructure and prompt length. For tasks where no source exists, you fall back to memory and accept higher risk—so reserve those tasks for low-stakes use.
Make Abstention a First-Class Outcome
Treat "I do not know" as a valid, desirable answer, not a failure state to be engineered away.
Why this matters
A model with no exit answers everything, including questions it cannot support, and fills gaps with invention. Granting explicit, concrete permission to abstain is one of the highest-leverage single lines you can add to a prompt.
The trade-off
Push abstention too hard and the model refuses questions it could have answered, frustrating users. The practice is to calibrate, measuring unnecessary refusals alongside fabrications, not to maximize abstention. This balance is covered in Build a Fabrication-Resistant Prompt in Eight Moves.
Require Evidence, Then Use the Absence of It
Demand that every claim cite the source passage that supports it, and treat unsupported claims as signals.
Why this works
A citation requirement forces a self-check before the model commits. When a claim has no supporting passage, the gap becomes visible, and a well-prompted model abstains rather than fabricating to fill it.
The trade-off
Models can fabricate citations too, quoting passages that do not actually support the claim. Evidence requirements reduce fabrication but do not eliminate the need for verification, especially on high-stakes output.
Separate Generation From Verification
Do not trust a single prompt to both produce an answer and confirm it is correct.
Why the split matters
A model evaluating its own fresh output tends to rationalize rather than scrutinize. A separate verification pass, framed independently, catches errors the generation step approved. Two passes are meaningfully more reliable than one.
The trade-off
A second pass doubles the cost and latency of each answer. Reserve it for tasks where a confident wrong answer causes real harm, and skip it where errors are cheap.
Constrain the Output, Not Just the Input
A tight output structure does as much to suppress fabrication as a careful input.
Why structure helps
Open-ended prose gives the model room to embellish. Defined fields, bounded lists, and required slots channel the model into the shape you want and starve the freelancing impulse. Length correlates with invention, so shorter, structured output drifts less.
The trade-off
Over-constraining can force the model to produce output it cannot support, jamming a guess into a required field. Pair structure with abstention so empty slots are allowed to stay empty.
Test Where Fabrication Actually Lives
Build evaluation around the questions your source cannot answer, because that is where hallucination shows up.
Why this is non-negotiable
A prompt that handles answerable questions tells you nothing about its fabrication rate. The risk is concentrated in unanswerable cases, and a prompt that abstains on all of them is doing its job.
The trade-off
Assembling and maintaining a labeled test set with unanswerable cases takes effort that feels unglamorous. But without it, every prompt change is a guess, and you will routinely fix one failure while creating another.
Sequencing the Practices
The practices above are not a menu to pick from at random. They have a natural order, and applying them out of sequence wastes effort.
Grounding comes before everything
There is no point requiring evidence or running verification if the model has no source to ground in. Secure the source material first, restrict the model to it, and only then layer on the practices that depend on that foundation. A team that adds verification before fixing grounding is polishing a guess.
Abstention and structure come next
Once grounded, add the abstention clause and constrain the output shape. These two work together: structure tells the model where to put answers, and abstention tells it that empty is an acceptable value. Apply them as a pair, because structure without an exit forces the model to jam a guess into a required slot.
Verification and testing close the loop
Evidence requirements and a verification pass are the last line, reserved for stakes that justify their cost. Testing on unanswerable questions wraps around all of it, because every other practice is unproven until you have measured it against the cases where fabrication lives. The sequence, end to end, mirrors the staged build in Build a Fabrication-Resistant Prompt in Eight Moves.
Frequently Asked Questions
What is the single most important practice?
Defaulting to grounding—treating the model's memory as unreliable and supplying source material for any specific fact. It addresses the root cause of most fabrication and converts a guess into a lookup. Everything else builds on top of it.
Are these practices model-specific?
No. They target how generation works, which is common across models. A newer or larger model may hallucinate somewhat less, but grounding, abstention, evidence requirements, and verification improve results on any model. They are portable, which is part of why they are worth investing in.
How do I balance abstention against usefulness?
Measure both. Track fabrications and unnecessary abstentions on a test set, and tune the abstention clause until both are low. The target is calibration—answering when the source supports it, abstaining when it does not—rather than maximizing either accuracy or caution in isolation.
Can required citations be trusted completely?
No. Models can fabricate citations or quote passages that do not actually support the claim. Citations sharply reduce fabrication and surface gaps, but on high-stakes output you still want a separate verification pass to confirm the cited source genuinely supports the answer.
When is a verification pass not worth it?
When errors are cheap and the task is low-stakes. The second pass roughly doubles cost and latency, so for casual or easily corrected output it is overkill. Reserve it for cases where a confident wrong answer creates real risk or liability.
Key Takeaways
- Default to grounding and treat the model's memory as an unreliable last resort for any specific fact.
- Make abstention a first-class, desirable outcome, then calibrate it so the model does not refuse answerable questions.
- Require evidence for every claim and treat unsupported claims as a signal to abstain, while remembering citations can be faked.
- Separate generation from verification for high-stakes tasks, accepting the added cost and latency.
- Build your testing around unanswerable questions, where fabrication actually lives, and tune toward calibration rather than any single extreme.