One-off prompt tweaks get a model citing sources for a single task. They do not scale. The moment a second person edits the prompt, or the model version changes, or the document set grows, the carefully tuned wording stops working and the citations degrade. What survives those changes is not clever phrasing but structure: a repeatable arrangement of stages that holds regardless of who runs it.
This article lays out a four-stage model we call the GROUND model: Gather, Ratify, Output, Underwrite. Each stage names a distinct job in the pipeline, and each has a trigger that tells you when it matters. You can apply all four for high-stakes work, or collapse to the essential two for lighter tasks. The point is to have a shared vocabulary so a team can reason about citation quality instead of trading prompt snippets in chat.
Below, each stage gets a definition, the problem it solves, and a note on when to apply it. Read it once as a whole, then keep it nearby as a reference when you design a new prompt or audit one that is producing weak citations.
Stage One: Gather
What it covers
Gather is everything that happens before the model writes a word: deciding what sources exist, retrieving them, and presenting them in a form the model can cite. A citation pipeline with a weak Gather stage cannot be rescued by clever output instructions, because the model has nothing trustworthy to point at.
When it applies
Always, for any open-book task. If you are asking the model to answer from provided material, Gather is where you win or lose. The discipline here mirrors the source-preparation items in Make the Model Show Its Receipts.
- Assign each document a stable identifier the model can copy verbatim.
- Strip or label low-quality sources so the model is not citing noise.
- Decide whether the task is open-book or closed-book and instruct accordingly.
Stage Two: Ratify
What it covers
Ratify is the instruction layer that tells the model what counts as an acceptable citation. It defines the format, the rule that every claim needs support, and the explicit prohibition against invention. Ratify turns a vague hope for citations into an enforceable contract.
When it applies
Whenever you write or revise the prompt. This is the stage most teams over-invest in and the one most likely to be copied incorrectly between projects, which is exactly why naming it helps.
- State the citation format with one worked example.
- Require a source marker for every factual claim.
- Forbid inventing sources and instruct the model to flag unsupported claims.
Stage Three: Output
What it covers
Output governs the shape of what comes back: inline markers versus a reference list, quoted spans for critical facts, and visible uncertainty flags. A well-designed Output stage makes the next stage, verification, fast and cheap rather than slow and manual.
When it applies
On any task where a human or downstream system will consume the citations. If the output feeds a report a client reads, structure it for verifiability, not just readability.
- Place source markers immediately after the claims they support.
- Require short verbatim quotes for numbers, dates, and named entities.
- Ask the model to mark partially supported claims explicitly.
Stage Four: Underwrite
What it covers
Underwrite is the human and automated checking that stands behind every published citation. The name is deliberate: someone is putting their name behind the claim that the citations are real and correct. Skipping Underwrite means nobody owns the accuracy of the output.
When it applies
On any output that leaves the building. The intensity scales with stakes, a calibration explored in The Decision Behind How Hard You Push Citations.
- Verify a sample of citations on routine work, all of them on high-stakes work.
- Automate checks that quoted spans appear verbatim in the named source.
- Log failures and feed them back into Gather and Ratify.
The metrics you watch during Underwrite, accuracy and coverage, are detailed in Counting What a Good Citation Actually Looks Like, which turns this stage from a gut check into a tracked number.
Applying the Model in Practice
Match the stages to the stakes
Not every task earns all four stages at full intensity. A draft brainstorm might run a light Gather and Output with no formal Underwrite. A regulatory summary runs all four at maximum rigor. The model gives you a vocabulary for making that trade-off consciously instead of by accident.
- For low-stakes drafts: light Gather, basic Ratify, minimal Underwrite.
- For client deliverables: full Gather and Ratify, structured Output, sampled Underwrite.
- For regulated or public claims: every stage at full intensity.
Diagnose failures by stage
When citations go wrong, the model tells you where to look. Fabricated sources usually mean a weak Gather (nothing real to cite) or weak Ratify (no prohibition). Correct sources misattributed to the wrong claim point at Output. Real-but-wrong claims that survive review point at Underwrite.
- Name the failing stage before rewriting anything.
- Fix the earliest failing stage first, since later stages depend on it.
Wiring the Stages Together
Hand off cleanly between stages
The stages fail at their seams as often as in their middles. Gather assigns an identifier that Ratify must reference, Ratify sets a format that Output must enforce, Output produces quotes that Underwrite must check. A break at any handoff degrades everything downstream, so treat the connections as first-class, not as gaps between the real work.
- Confirm identifiers assigned in Gather survive intact into Output.
- Make sure the format Ratify specifies is the format Underwrite knows how to verify.
Keep a single owner per stage
When everyone owns citation quality, no one does. Assigning a clear owner to each stage, even informally, prevents the diffusion of responsibility that lets failures slip through. The owner of Underwrite, in particular, is the person putting their name behind the published claim, which is why the stage carries that name.
- Name an owner for each stage so accountability is unambiguous.
- Make the Underwrite owner the final sign-off on anything that ships.
Evolve the model with your evidence
A framework that never changes calcifies. As you log failures and learn where your pipeline actually breaks, fold those lessons back into how you run each stage. The four stages are stable; the practices inside them should sharpen over time as your evidence accumulates.
- Feed verified failures back into the stage that produced them.
- Revisit your per-stage practices on a regular cadence, not only after incidents.
Frequently Asked Questions
Is this framework specific to one model or vendor?
No. The four stages describe jobs that exist in any citation pipeline regardless of which model you use. The wording inside Ratify and the retrieval mechanics inside Gather will differ between vendors, but the structure holds. That portability is the main reason to think in stages rather than memorizing one provider's prompt.
Can I skip the Underwrite stage to move faster?
Only when the cost of a wrong citation is genuinely low, such as internal brainstorming. For anything a client or the public sees, skipping Underwrite means publishing claims nobody has verified. You can make Underwrite cheap by automating verbatim-quote checks, but you should not remove the human checkpoint entirely on high-stakes work.
Where do most teams go wrong with this model?
They over-invest in Ratify, the prompt-wording stage, and under-invest in Gather and Underwrite. A perfectly worded instruction cannot make a model cite sources that were never supplied, and it cannot catch a citation that is correctly formatted but factually wrong. Balance matters more than any single stage.
How is this different from a plain checklist?
A checklist tells you what to do; the framework tells you why each item exists and where it belongs. When something breaks, the stages let you localize the problem instead of rerunning the whole checklist. The two complement each other: use the framework to reason, the checklist to execute.
How long does it take to adopt across a team?
Most teams internalize the four stage names within a week of using them in reviews. The vocabulary is the fast part. Building good Gather infrastructure and disciplined Underwrite habits takes longer, but those investments pay off across every future project rather than a single prompt.
Key Takeaways
- Clever prompt wording does not survive edits, version changes, or growing document sets; structure does.
- The GROUND model splits citation into Gather, Ratify, Output, and Underwrite, each with a clear job and trigger.
- Match stage intensity to stakes rather than running every stage at full rigor on every task.
- Diagnose citation failures by naming the failing stage before rewriting anything.
- Balance the stages; over-investing in prompt wording while neglecting sources or verification is the most common mistake.