Set Plays for Reliable Source Citations

A playbook is different from a guide. A guide explains a topic; a playbook tells you exactly what to run, when to run it, and who is responsible. Source-citing is a good candidate for a playbook because it is not one technique—it is a sequence of plays that each handle a different situation, and most teams fail by running only one of them.

The team that adds "cite your sources" to a prompt and stops there is running a single play and assuming it covers every case. It does not. There is a play for grounded retrieval tasks, a different one for when the model has no source, a verification play, an escalation play, and a maintenance play. Knowing which to run when is the actual skill.

This article lays out the full set as a sequenced operating system. Each play has a trigger (when it fires), an action (what to do), and an owner (who is accountable). Run them in order and source-citing stops being a hopeful instruction and becomes a reliable capability.

Play 1: Establish the Citation Standard

Everything downstream depends on this. Run it first, once, then maintain it.

Trigger and action

Trigger: Before any team-wide use of source-citing.
Action: Write a one-page standard defining what counts as a source, the required format, the granularity of citation, and the honesty clause.
Owner: The prompt-quality lead.

Why it comes first

Without a written standard, every other play has nothing to enforce. The standard is the constitution; the rest of the playbook executes it. Keep it short enough that people actually read it and specific enough that "cite your sources" means the same thing to everyone. This anchors the broader effort described in Rolling Out Source-Citing Across a Team.

Play 2: Ground the Task

Citations are only as good as the material the model has to cite.

Trigger and action

Trigger: Any task requiring factual or quantitative claims.
Action: Provide the model with real source material—pasted text, attached files, or retrieved documents—before asking for claims and citations.
Owner: The author of the prompt.

The grounding hierarchy

Best: Retrieval that pulls relevant documents into context, as in RAG Implementation.
Good: Manually pasted or attached source material.
Risky: Asking the model to cite its general training, which invites fabrication.

The grounding play is what makes citations point at something real instead of something invented.

Play 3: Issue the Citation Instruction

This is the play everyone knows—but it only works after Plays 1 and 2.

Trigger and action

Trigger: Every grounded task that needs verifiable claims.
Action: Apply the standard prompt block: source scope, format, quoted snippets, and the honesty clause.
Owner: The prompt author.

The honesty clause is non-negotiable here. It is the difference between a model that admits gaps and one that fills them with fabricated references. Run this play with the shared block from your prompt library, not freehand, so the standard is applied identically every time.

Play 4: Verify in Tiers

The instruction produces citations; this play confirms they are real.

Trigger and action

Trigger: Before output ships or informs a decision.
Action: Apply tiered verification—snippet check on everything, existence and support checks on high-stakes claims.
Owner: The reviewer (for client-facing work) or the author (for internal drafts).

The tiering rule

Internal draft: Read the quoted snippets.
Client-facing: Verify existence and support on every load-bearing claim.
Contractual or regulated: Full verification plus retention of cited sources.

This is the same discipline detailed in Prompting for Error Detection and Correction: The Complete Guide, focused on citations.

Play 5: Escalate Failures

What you do when verification fails determines whether the system learns.

Trigger and action

Trigger: A fabricated or mismatched citation is found.
Action: Correct the output, log the failure, and—if it reveals a pattern—update the standard or prompt block.
Owner: The reviewer who found it, escalating to the prompt-quality lead.

Why the log matters

A single caught error is a near-miss. A pattern of similar errors is a signal that the standard or prompt needs to change. The escalation play turns isolated catches into systemic improvement, the way any AI Prompt Governance process should.

Play 6: Maintain the System

Citation behavior drifts; this play keeps the playbook current.

Trigger and action

Trigger: Monthly, or whenever the team changes models.
Action: Spot-check recent deliverables, re-test the prompt block against the current model, update the standard, and re-announce changes.
Owner: The prompt-quality lead.

Skipping maintenance is how fabricated citations creep back in after a model update. The playbook is a living system, not a one-time install.

Running the Plays in Sequence

The order is the point. Run them out of sequence and the system breaks.

The canonical sequence

Set up once: Play 1 (standard), then Play 6's cadence (maintenance schedule).
Per task: Play 2 (ground) → Play 3 (instruct) → Play 4 (verify).
On failure: Play 5 (escalate) → feed back into Plays 1 and 3.

A team running only Play 3—the citation instruction—gets confident, sometimes-fabricated references with no safety net. A team running all six gets verifiable output that improves over time. The plays are cheap individually; their value comes from running them as a connected loop.

A worked example

Consider a research summary headed to a client. The prompt-quality lead has already run Play 1, so a standard exists, and Play 6's cadence is on the calendar. The analyst runs Play 2 by retrieving the relevant source documents into context, then Play 3 by pasting the shared citation block with its honesty clause. The model returns a summary with quoted snippets attached to each claim. The reviewer runs Play 4, reading every snippet and confirming the two load-bearing statistics against the cited passages—one checks out, one is attached to a source that mentions a different figure. That triggers Play 5: the reviewer corrects the claim, logs the mismatch, and notes that quantitative claims in this document type are a recurring weak spot. At the next monthly review under Play 6, the lead sees three similar entries and updates the standard prompt block to require an explicit recompute step for figures. That is the loop working—one caught error becoming a permanent improvement.

Frequently Asked Questions

Which play do most teams skip?

Play 5, escalation, and Play 6, maintenance. Teams set up a standard and a prompt, then never close the loop when failures appear or re-test after a model change. The result is a system that looks fine on day one and quietly degrades. The feedback loop is what separates a playbook from a one-time setup.

Can a small team run all six plays?

Yes, scaled down. A two-person team can keep the standard to half a page, make verification a quick self-check, and do maintenance in fifteen minutes a month. The plays are about discipline, not headcount. What matters is that grounding, instruction, and verification all happen—not how formal the process looks.

How is this different from just having a good prompt?

A good prompt is Play 3. The playbook adds the standard that defines what good means, the grounding that gives citations something to point at, the verification that catches failures, and the maintenance that keeps it working. A prompt alone produces citations; the system makes them trustworthy.

Who should own the playbook?

A single prompt-quality lead owns the standard, maintenance, and escalation handling, while individual authors and reviewers own grounding, instruction, and verification on their work. Distributed execution with centralized ownership of the standard is the structure that scales without fragmenting.

What triggers a standard update?

A pattern of similar failures found during verification or maintenance, or a model change that alters citation behavior. One-off errors get corrected and logged; recurring ones change the standard or prompt block. Tie updates to evidence from the escalation log rather than to opinion.

How long until this feels routine?

A few weeks of consistent use. The first pass is deliberate; after running the per-task sequence a dozen times, grounding-then-instruct-then-verify becomes automatic. Maintenance stays a scheduled activity rather than a habit, which is why it needs a named owner and a calendar reminder.

Key Takeaways

Source-citing is a set of sequenced plays, not a single instruction; teams fail by running only the citation prompt.
Establish a written standard first (Play 1), then ground every factual task with real material (Play 2) before issuing the citation instruction (Play 3).
Verify in tiers matched to stakes (Play 4), and escalate failures into a log that drives standard updates (Play 5).
Maintain the system on a monthly cadence and re-test after model changes (Play 6)—the most-skipped plays are escalation and maintenance.
Run the plays as a connected loop: ground, instruct, verify per task; escalate on failure; maintain on schedule. The loop, not any single play, is what makes citations trustworthy.

Play 1: Establish the Citation Standard

Everything downstream depends on this. Run it first, once, then maintain it.

Trigger and action

Trigger: Before any team-wide use of source-citing.
Action: Write a one-page standard defining what counts as a source, the required format, the granularity of citation, and the honesty clause.
Owner: The prompt-quality lead.

Why it comes first

Play 2: Ground the Task

Citations are only as good as the material the model has to cite.

Trigger and action

Trigger: Any task requiring factual or quantitative claims.
Action: Provide the model with real source material—pasted text, attached files, or retrieved documents—before asking for claims and citations.
Owner: The author of the prompt.

The grounding hierarchy

Best: Retrieval that pulls relevant documents into context, as in RAG Implementation.
Good: Manually pasted or attached source material.
Risky: Asking the model to cite its general training, which invites fabrication.

The grounding play is what makes citations point at something real instead of something invented.

Play 3: Issue the Citation Instruction

This is the play everyone knows—but it only works after Plays 1 and 2.

Trigger and action

Trigger: Every grounded task that needs verifiable claims.
Action: Apply the standard prompt block: source scope, format, quoted snippets, and the honesty clause.
Owner: The prompt author.

Play 4: Verify in Tiers

The instruction produces citations; this play confirms they are real.

Trigger and action

Trigger: Before output ships or informs a decision.
Action: Apply tiered verification—snippet check on everything, existence and support checks on high-stakes claims.
Owner: The reviewer (for client-facing work) or the author (for internal drafts).

The tiering rule

Internal draft: Read the quoted snippets.
Client-facing: Verify existence and support on every load-bearing claim.
Contractual or regulated: Full verification plus retention of cited sources.

This is the same discipline detailed in Prompting for Error Detection and Correction: The Complete Guide, focused on citations.

Play 5: Escalate Failures

What you do when verification fails determines whether the system learns.

Trigger and action

Trigger: A fabricated or mismatched citation is found.
Action: Correct the output, log the failure, and—if it reveals a pattern—update the standard or prompt block.
Owner: The reviewer who found it, escalating to the prompt-quality lead.

Why the log matters

Play 6: Maintain the System

Citation behavior drifts; this play keeps the playbook current.

Trigger and action

Trigger: Monthly, or whenever the team changes models.
Action: Spot-check recent deliverables, re-test the prompt block against the current model, update the standard, and re-announce changes.
Owner: The prompt-quality lead.

Skipping maintenance is how fabricated citations creep back in after a model update. The playbook is a living system, not a one-time install.

Running the Plays in Sequence

The order is the point. Run them out of sequence and the system breaks.

The canonical sequence

Set up once: Play 1 (standard), then Play 6's cadence (maintenance schedule).
Per task: Play 2 (ground) → Play 3 (instruct) → Play 4 (verify).
On failure: Play 5 (escalate) → feed back into Plays 1 and 3.

A worked example

Frequently Asked Questions

Which play do most teams skip?

Can a small team run all six plays?

How is this different from just having a good prompt?

Who should own the playbook?

What triggers a standard update?

How long until this feels routine?

Key Takeaways

Source-citing is a set of sequenced plays, not a single instruction; teams fail by running only the citation prompt.
Establish a written standard first (Play 1), then ground every factual task with real material (Play 2) before issuing the citation instruction (Play 3).
Verify in tiers matched to stakes (Play 4), and escalate failures into a log that drives standard updates (Play 5).
Maintain the system on a monthly cadence and re-test after model changes (Play 6)—the most-skipped plays are escalation and maintenance.
Run the plays as a connected loop: ground, instruct, verify per task; escalate on failure; maintain on schedule. The loop, not any single play, is what makes citations trustworthy.

Set Plays for Reliable Source Citations

Play 1: Establish the Citation Standard

Trigger and action

Why it comes first

Play 2: Ground the Task

Trigger and action

The grounding hierarchy

Play 3: Issue the Citation Instruction

Trigger and action

Play 4: Verify in Tiers

Trigger and action

The tiering rule

Play 5: Escalate Failures

Trigger and action

Why the log matters

Play 6: Maintain the System

Trigger and action

Running the Plays in Sequence

The canonical sequence

A worked example

Frequently Asked Questions

Which play do most teams skip?

Can a small team run all six plays?

How is this different from just having a good prompt?

Who should own the playbook?

What triggers a standard update?

How long until this feels routine?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Set Plays for Reliable Source Citations

Play 1: Establish the Citation Standard

Trigger and action

Why it comes first

Play 2: Ground the Task

Trigger and action

The grounding hierarchy

Play 3: Issue the Citation Instruction

Trigger and action

Play 4: Verify in Tiers

Trigger and action

The tiering rule

Play 5: Escalate Failures

Trigger and action

Why the log matters

Play 6: Maintain the System

Trigger and action

Running the Plays in Sequence

The canonical sequence

A worked example

Frequently Asked Questions

Which play do most teams skip?

Can a small team run all six plays?

How is this different from just having a good prompt?

Who should own the playbook?

What triggers a standard update?

How long until this feels routine?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?