AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Play 1: Establish the Citation StandardTrigger and actionWhy it comes firstPlay 2: Ground the TaskTrigger and actionThe grounding hierarchyPlay 3: Issue the Citation InstructionTrigger and actionPlay 4: Verify in TiersTrigger and actionThe tiering rulePlay 5: Escalate FailuresTrigger and actionWhy the log mattersPlay 6: Maintain the SystemTrigger and actionRunning the Plays in SequenceThe canonical sequenceA worked exampleFrequently Asked QuestionsWhich play do most teams skip?Can a small team run all six plays?How is this different from just having a good prompt?Who should own the playbook?What triggers a standard update?How long until this feels routine?Key Takeaways
Home/Blog/Set Plays for Reliable Source Citations
General

Set Plays for Reliable Source Citations

A

Agency Script Editorial

Editorial Team

·November 5, 2020·7 min read
instructing models to cite sourcesinstructing models to cite sources playbookinstructing models to cite sources guideprompt engineering

A playbook is different from a guide. A guide explains a topic; a playbook tells you exactly what to run, when to run it, and who is responsible. Source-citing is a good candidate for a playbook because it is not one technique—it is a sequence of plays that each handle a different situation, and most teams fail by running only one of them.

The team that adds "cite your sources" to a prompt and stops there is running a single play and assuming it covers every case. It does not. There is a play for grounded retrieval tasks, a different one for when the model has no source, a verification play, an escalation play, and a maintenance play. Knowing which to run when is the actual skill.

This article lays out the full set as a sequenced operating system. Each play has a trigger (when it fires), an action (what to do), and an owner (who is accountable). Run them in order and source-citing stops being a hopeful instruction and becomes a reliable capability.

Play 1: Establish the Citation Standard

Everything downstream depends on this. Run it first, once, then maintain it.

Trigger and action

  • Trigger: Before any team-wide use of source-citing.
  • Action: Write a one-page standard defining what counts as a source, the required format, the granularity of citation, and the honesty clause.
  • Owner: The prompt-quality lead.

Why it comes first

Without a written standard, every other play has nothing to enforce. The standard is the constitution; the rest of the playbook executes it. Keep it short enough that people actually read it and specific enough that "cite your sources" means the same thing to everyone. This anchors the broader effort described in Rolling Out Source-Citing Across a Team.

Play 2: Ground the Task

Citations are only as good as the material the model has to cite.

Trigger and action

  • Trigger: Any task requiring factual or quantitative claims.
  • Action: Provide the model with real source material—pasted text, attached files, or retrieved documents—before asking for claims and citations.
  • Owner: The author of the prompt.

The grounding hierarchy

  • Best: Retrieval that pulls relevant documents into context, as in RAG Implementation.
  • Good: Manually pasted or attached source material.
  • Risky: Asking the model to cite its general training, which invites fabrication.

The grounding play is what makes citations point at something real instead of something invented.

Play 3: Issue the Citation Instruction

This is the play everyone knows—but it only works after Plays 1 and 2.

Trigger and action

  • Trigger: Every grounded task that needs verifiable claims.
  • Action: Apply the standard prompt block: source scope, format, quoted snippets, and the honesty clause.
  • Owner: The prompt author.

The honesty clause is non-negotiable here. It is the difference between a model that admits gaps and one that fills them with fabricated references. Run this play with the shared block from your prompt library, not freehand, so the standard is applied identically every time.

Play 4: Verify in Tiers

The instruction produces citations; this play confirms they are real.

Trigger and action

  • Trigger: Before output ships or informs a decision.
  • Action: Apply tiered verification—snippet check on everything, existence and support checks on high-stakes claims.
  • Owner: The reviewer (for client-facing work) or the author (for internal drafts).

The tiering rule

  • Internal draft: Read the quoted snippets.
  • Client-facing: Verify existence and support on every load-bearing claim.
  • Contractual or regulated: Full verification plus retention of cited sources.

This is the same discipline detailed in Prompting for Error Detection and Correction: The Complete Guide, focused on citations.

Play 5: Escalate Failures

What you do when verification fails determines whether the system learns.

Trigger and action

  • Trigger: A fabricated or mismatched citation is found.
  • Action: Correct the output, log the failure, and—if it reveals a pattern—update the standard or prompt block.
  • Owner: The reviewer who found it, escalating to the prompt-quality lead.

Why the log matters

A single caught error is a near-miss. A pattern of similar errors is a signal that the standard or prompt needs to change. The escalation play turns isolated catches into systemic improvement, the way any AI Prompt Governance process should.

Play 6: Maintain the System

Citation behavior drifts; this play keeps the playbook current.

Trigger and action

  • Trigger: Monthly, or whenever the team changes models.
  • Action: Spot-check recent deliverables, re-test the prompt block against the current model, update the standard, and re-announce changes.
  • Owner: The prompt-quality lead.

Skipping maintenance is how fabricated citations creep back in after a model update. The playbook is a living system, not a one-time install.

Running the Plays in Sequence

The order is the point. Run them out of sequence and the system breaks.

The canonical sequence

  • Set up once: Play 1 (standard), then Play 6's cadence (maintenance schedule).
  • Per task: Play 2 (ground) → Play 3 (instruct) → Play 4 (verify).
  • On failure: Play 5 (escalate) → feed back into Plays 1 and 3.

A team running only Play 3—the citation instruction—gets confident, sometimes-fabricated references with no safety net. A team running all six gets verifiable output that improves over time. The plays are cheap individually; their value comes from running them as a connected loop.

A worked example

Consider a research summary headed to a client. The prompt-quality lead has already run Play 1, so a standard exists, and Play 6's cadence is on the calendar. The analyst runs Play 2 by retrieving the relevant source documents into context, then Play 3 by pasting the shared citation block with its honesty clause. The model returns a summary with quoted snippets attached to each claim. The reviewer runs Play 4, reading every snippet and confirming the two load-bearing statistics against the cited passages—one checks out, one is attached to a source that mentions a different figure. That triggers Play 5: the reviewer corrects the claim, logs the mismatch, and notes that quantitative claims in this document type are a recurring weak spot. At the next monthly review under Play 6, the lead sees three similar entries and updates the standard prompt block to require an explicit recompute step for figures. That is the loop working—one caught error becoming a permanent improvement.

Frequently Asked Questions

Which play do most teams skip?

Play 5, escalation, and Play 6, maintenance. Teams set up a standard and a prompt, then never close the loop when failures appear or re-test after a model change. The result is a system that looks fine on day one and quietly degrades. The feedback loop is what separates a playbook from a one-time setup.

Can a small team run all six plays?

Yes, scaled down. A two-person team can keep the standard to half a page, make verification a quick self-check, and do maintenance in fifteen minutes a month. The plays are about discipline, not headcount. What matters is that grounding, instruction, and verification all happen—not how formal the process looks.

How is this different from just having a good prompt?

A good prompt is Play 3. The playbook adds the standard that defines what good means, the grounding that gives citations something to point at, the verification that catches failures, and the maintenance that keeps it working. A prompt alone produces citations; the system makes them trustworthy.

Who should own the playbook?

A single prompt-quality lead owns the standard, maintenance, and escalation handling, while individual authors and reviewers own grounding, instruction, and verification on their work. Distributed execution with centralized ownership of the standard is the structure that scales without fragmenting.

What triggers a standard update?

A pattern of similar failures found during verification or maintenance, or a model change that alters citation behavior. One-off errors get corrected and logged; recurring ones change the standard or prompt block. Tie updates to evidence from the escalation log rather than to opinion.

How long until this feels routine?

A few weeks of consistent use. The first pass is deliberate; after running the per-task sequence a dozen times, grounding-then-instruct-then-verify becomes automatic. Maintenance stays a scheduled activity rather than a habit, which is why it needs a named owner and a calendar reminder.

Key Takeaways

  • Source-citing is a set of sequenced plays, not a single instruction; teams fail by running only the citation prompt.
  • Establish a written standard first (Play 1), then ground every factual task with real material (Play 2) before issuing the citation instruction (Play 3).
  • Verify in tiers matched to stakes (Play 4), and escalate failures into a log that drives standard updates (Play 5).
  • Maintain the system on a monthly cadence and re-test after model changes (Play 6)—the most-skipped plays are escalation and maintenance.
  • Run the plays as a connected loop: ground, instruct, verify per task; escalate on failure; maintain on schedule. The loop, not any single play, is what makes citations trustworthy.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification