Make the Model Show Its Receipts: A Citation Checklist

A language model will happily produce a confident paragraph with a footnote attached to nothing. The footnote looks like proof, but it points at a fact the model invented or a document it never read. For agency teams shipping research summaries, client-facing briefs, and retrieval-augmented assistants, that gap between the appearance of a citation and an actual verifiable source is where reputations get damaged.

The fix is rarely a single magic instruction. It is a set of small, deliberate constraints applied consistently across the prompt, the retrieval layer, and the review pass. This article gives you a checklist you can paste into your own runbook. Each item includes the reason it earns a spot, because a checklist nobody understands is a checklist nobody follows.

Treat the items below as defaults you adapt, not commandments. Some apply only when you have a retrieval system attached; others apply to any model you can prompt. Mark the ones that fit your workflow and revisit the list whenever a citation slips through that should not have.

Before You Write the Prompt

Confirm the model actually has sources to cite

A model cannot cite what it cannot see. If you are asking for citations without supplying documents, you are inviting the model to fabricate plausible-looking references. Decide up front whether this is a closed-book task (the model answers from training data and should say so) or an open-book task (the model answers from provided context and must cite it).

Attach the source material as context, or wire up retrieval, before demanding citations.
If no sources are available, instruct the model to say so rather than guess.

Assign stable identifiers to every source

Give each document a short, unambiguous label such as [S1], [S2], or a filename. Models are far more reliable at reproducing a token they were handed than at reconstructing a long URL or a full bibliographic string from memory.

Number sources in the order you supply them.
Keep identifiers short so the model copies them exactly.

Writing the Instruction

State the citation format explicitly

Do not assume the model knows you want inline brackets, footnotes, or a reference list. Spell it out, and show one example. Ambiguity here produces inconsistent output that is painful to parse downstream.

Specify inline markers like [S1] immediately after the claim they support.
Provide a single worked example in the prompt so the model matches the shape.

Require a citation for every factual claim

The instruction that does the heavy lifting is a rule that ties each non-obvious statement to a source. Phrase it as a hard requirement, not a suggestion. This mirrors the discipline discussed in A Citation Discipline You Can Actually Reuse, where structure beats one-off wording.

Tell the model: every factual sentence must end with at least one source marker.
Allow uncited sentences only for reasoning, transitions, or clearly labeled opinion.

Forbid invented sources in plain language

Models pattern-match on the format of citations, which is exactly why they fabricate them. A direct prohibition, stated in the negative, measurably reduces invented references.

Add: do not invent sources; only cite from the provided list.
Instruct the model to flag any claim it cannot support rather than fabricate one.

Controlling the Output

Demand quoted spans for high-stakes claims

For numbers, dates, names, and anything a client might act on, ask the model to include a short verbatim quote from the source alongside the citation. A quote is far easier to verify than a bare reference and exposes paraphrase drift.

Request a quoted snippet of fewer than 25 words for each critical fact.
Verifying the quote against the source becomes a five-second check.

Make uncertainty visible

A model that must signal when support is weak gives your reviewers a map of where to look. Without this, every sentence looks equally trustworthy.

Ask the model to mark low-confidence or partially supported claims.
Treat any flagged claim as unpublishable until a human confirms it.

The Review Pass

Spot-check citations against the actual text

Automated generation does not remove the human checkpoint; it relocates it. The cheapest insurance against a public error is a reviewer who opens two or three cited sources and confirms they say what the model claims. This connects to the measurement habits in Counting What a Good Citation Actually Looks Like.

Verify a sample of citations on every output, all of them on high-stakes work.
Log every miss so you can tighten the prompt that produced it.

Close the loop on failures

A citation that slips through is a free lesson if you capture it. Feed recurring failure patterns back into your prompt, your retrieval filters, or your reviewer guidance. Teams that skip this step relearn the same mistake every week, a pattern explored in The Usual Ways Citation Prompts Quietly Fail.

Keep a short log of citation failures and their root cause.
Update the prompt template when the same failure appears twice.

Edge Cases Worth a Checklist Item

Handle claims that span multiple sources

Some statements draw on two or three documents at once, and a single marker undersells the support. Instruct the model to attach every relevant identifier rather than picking one, so a reviewer can see the full basis for the claim. This matters most for synthesized conclusions, which are exactly the claims a reader is likely to challenge.

Allow and request multiple markers when a claim rests on several sources.
Treat a synthesized conclusion with a single source as a flag to inspect.

Decide what to do with common knowledge

Not every sentence needs a citation. Widely known facts and your own reasoning do not require a source, and forcing citations onto them produces noise that buries the citations that matter. Draw the line explicitly in your instruction so the model knows what to leave uncited.

Exempt genuinely common knowledge and clearly labeled reasoning from the citation rule.
Keep the exemption narrow so it does not become a loophole for unsupported claims.

Preserve citations through downstream edits

A citation is only useful if it survives the editing and formatting steps that follow generation. Teams often strip markers during cleanup and ship unsupported prose. Make citation preservation an explicit step in your production process, not an afterthought a copy editor quietly undoes.

Confirm that markers and quotes survive every formatting and editing pass.
Re-verify a sample after final formatting, since edits can break the link.

Frequently Asked Questions

Do I need a retrieval system to get reliable citations?

Not always, but it helps enormously. Without retrieval, the model cites from training data, which it cannot reliably reproduce or verify. With retrieval, you control exactly which documents are available, and citations point at real text you can check. For any task where accuracy matters, supplying the sources yourself is the more dependable path.

Why do models invent citations even when told not to?

Citations are a learned format, and models reproduce formats fluently whether or not the underlying fact is real. A bare prohibition reduces fabrication but does not eliminate it. Pairing the prohibition with stable source identifiers and a requirement to quote verbatim spans makes invention much harder, because the model has to point at text that either exists or does not.

How many citations should one answer contain?

Enough that every factual claim is supported, and no more. Over-citing dilutes signal and makes review harder; under-citing leaves claims unsupported. A practical target is one source marker per factual sentence, with multiple markers when a claim draws on several documents.

Can I automate the verification step?

Partially. You can automate checks that a cited identifier exists in the source list and that a quoted span appears verbatim in the named document. What you cannot fully automate is judging whether the source genuinely supports the claim's meaning. Keep a human in the loop for high-stakes output.

What is the single highest-impact item on this list?

Assigning stable identifiers to your sources and supplying them as context. Once the model is citing from a known, labeled set rather than its memory, almost every other item becomes enforceable. It turns citation from an act of recall into an act of copying.

Key Takeaways

Models fabricate citations because citation is a format they imitate; constrain the format and the source set, not just the wording.
Supply labeled sources as context before demanding citations, and forbid the model from inventing references.
Require a source marker for every factual claim and verbatim quotes for high-stakes facts.
Make uncertainty visible so reviewers know exactly where to look.
Keep a human verification pass and log every miss to tighten the prompt over time.

Before You Write the Prompt

Confirm the model actually has sources to cite

Attach the source material as context, or wire up retrieval, before demanding citations.
If no sources are available, instruct the model to say so rather than guess.

Assign stable identifiers to every source

Number sources in the order you supply them.
Keep identifiers short so the model copies them exactly.

Writing the Instruction

State the citation format explicitly

Specify inline markers like [S1] immediately after the claim they support.
Provide a single worked example in the prompt so the model matches the shape.

Require a citation for every factual claim

Tell the model: every factual sentence must end with at least one source marker.
Allow uncited sentences only for reasoning, transitions, or clearly labeled opinion.

Forbid invented sources in plain language

Models pattern-match on the format of citations, which is exactly why they fabricate them. A direct prohibition, stated in the negative, measurably reduces invented references.

Add: do not invent sources; only cite from the provided list.
Instruct the model to flag any claim it cannot support rather than fabricate one.

Controlling the Output

Demand quoted spans for high-stakes claims

Request a quoted snippet of fewer than 25 words for each critical fact.
Verifying the quote against the source becomes a five-second check.

Make uncertainty visible

A model that must signal when support is weak gives your reviewers a map of where to look. Without this, every sentence looks equally trustworthy.

Ask the model to mark low-confidence or partially supported claims.
Treat any flagged claim as unpublishable until a human confirms it.

The Review Pass

Spot-check citations against the actual text

Verify a sample of citations on every output, all of them on high-stakes work.
Log every miss so you can tighten the prompt that produced it.

Close the loop on failures

Keep a short log of citation failures and their root cause.
Update the prompt template when the same failure appears twice.

Edge Cases Worth a Checklist Item

Handle claims that span multiple sources

Allow and request multiple markers when a claim rests on several sources.
Treat a synthesized conclusion with a single source as a flag to inspect.

Decide what to do with common knowledge

Exempt genuinely common knowledge and clearly labeled reasoning from the citation rule.
Keep the exemption narrow so it does not become a loophole for unsupported claims.

Preserve citations through downstream edits

Confirm that markers and quotes survive every formatting and editing pass.
Re-verify a sample after final formatting, since edits can break the link.

Frequently Asked Questions

Do I need a retrieval system to get reliable citations?

Why do models invent citations even when told not to?

How many citations should one answer contain?

Can I automate the verification step?

What is the single highest-impact item on this list?

Key Takeaways

Models fabricate citations because citation is a format they imitate; constrain the format and the source set, not just the wording.
Supply labeled sources as context before demanding citations, and forbid the model from inventing references.
Require a source marker for every factual claim and verbatim quotes for high-stakes facts.
Make uncertainty visible so reviewers know exactly where to look.
Keep a human verification pass and log every miss to tighten the prompt over time.

Make the Model Show Its Receipts: A Citation Checklist

Before You Write the Prompt

Confirm the model actually has sources to cite

Assign stable identifiers to every source

Writing the Instruction

State the citation format explicitly

Require a citation for every factual claim

Forbid invented sources in plain language

Controlling the Output

Demand quoted spans for high-stakes claims

Make uncertainty visible

The Review Pass

Spot-check citations against the actual text

Close the loop on failures

Edge Cases Worth a Checklist Item

Handle claims that span multiple sources

Decide what to do with common knowledge

Preserve citations through downstream edits

Frequently Asked Questions

Do I need a retrieval system to get reliable citations?

Why do models invent citations even when told not to?

How many citations should one answer contain?

Can I automate the verification step?

What is the single highest-impact item on this list?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Make the Model Show Its Receipts: A Citation Checklist

Before You Write the Prompt

Confirm the model actually has sources to cite

Assign stable identifiers to every source

Writing the Instruction

State the citation format explicitly

Require a citation for every factual claim

Forbid invented sources in plain language

Controlling the Output

Demand quoted spans for high-stakes claims

Make uncertainty visible

The Review Pass

Spot-check citations against the actual text

Close the loop on failures

Edge Cases Worth a Checklist Item

Handle claims that span multiple sources

Decide what to do with common knowledge

Preserve citations through downstream edits

Frequently Asked Questions

Do I need a retrieval system to get reliable citations?

Why do models invent citations even when told not to?

How many citations should one answer contain?

Can I automate the verification step?

What is the single highest-impact item on this list?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?