AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Select: Getting the Right Facts In Front of the ModelThe Job of This StageThe Failure to WatchOrganize: Shaping Documents So Retrieval Can Find ThemThe Job of This StageThe Failure to WatchUnite: Assembling Context Into a Coherent PromptThe Job of This StageThe Failure to WatchRestrict: Keeping the Model Inside the EvidenceThe Job of This StageThe Failure to WatchCite: Making Every Claim TraceableThe Job of This StageThe Failure to WatchEvaluate: Measuring Whether the System Actually WorksThe Job of This StageThe Failure to WatchApplying the Whole ModelDiagnose by StageImprove One Stage at a TimeWhere the Stages InteractUpstream Stages Constrain Downstream OnesRestrict and Cite Reinforce Each OtherEvaluate Watches the Whole ChainFrequently Asked QuestionsDo the stages have to run in this exact order?Where do most failures land in the SOURCE model?Is this framework tied to any particular tools?How is Restrict different from Cite?Key Takeaways
Home/Blog/The SOURCE Model for Reliable Retrieval-Backed Answers
General

The SOURCE Model for Reliable Retrieval-Backed Answers

A

Agency Script Editorial

Editorial Team

·September 21, 2022·8 min read
grounding prompts with retrieved contextgrounding prompts with retrieved context frameworkgrounding prompts with retrieved context guideprompt engineering

Most teams approach grounding as a bag of tactics: chunk the documents, retrieve some passages, write a careful instruction. The tactics are sound, but without a structure to hang them on, it is hard to know which one to reach for when answers go wrong. A framework fixes that. It gives you named stages, so when something fails you can point to the stage responsible instead of flailing across the whole pipeline.

This article introduces the SOURCE model, a reusable structure for any grounded system. The name is a mnemonic for its six stages: Select, Organize, Unite, Restrict, Cite, and Evaluate. Each stage has a job, a common failure, and a clear handoff to the next. The model is deliberately simple, because a framework you cannot remember is one you will not use.

Walk through the six stages once and you will have a map. After that, every grounding decision you face has a home, and every failure has an address. That addressing is the whole point of a framework: it converts the vague feeling that something is wrong into a precise question about which stage is responsible, and a precise question is one you can actually answer.

Select: Getting the Right Facts In Front of the Model

The Job of This Stage

Select is retrieval: given a question, find the passages most likely to contain the answer. Everything downstream depends on this stage doing its job, because the model can only reason over what Select hands it.

The Failure to Watch

The classic failure is returning passages that do not contain the answer. Catch it by inspecting retrieved chunks directly, before the model runs. If Select fails, no later stage can recover. This is why retrieval inspection leads the workflow in Build a Grounded Prompt Pipeline in Eight Concrete Steps. Because Select sits at the head of the chain, time spent strengthening it returns more than time spent anywhere else, and teams that obsess over prompt wording while neglecting Select are optimizing the wrong stage.

Organize: Shaping Documents So Retrieval Can Find Them

The Job of This Stage

Organize covers chunking and indexing, the preparation that makes Select possible. How you split documents determines whether retrieval can find coherent, answer-bearing passages at all.

The Failure to Watch

Splitting on raw character counts cuts ideas in half, so retrieval returns fragments. Organize correctly by splitting on natural boundaries with slight overlap. A weak Organize stage quietly sabotages a strong Select.

Unite: Assembling Context Into a Coherent Prompt

The Job of This Stage

Unite combines the retrieved passages, the instruction, and the question into a single prompt. Its job is arrangement: keeping context lean, marking it clearly, and ordering passages so the strongest sits where the model weights it most.

The Failure to Watch

The common error is uniting too much, stuffing in twenty passages when four would do. Excess context dilutes attention and raises cost. Unite favors precision over volume.

Restrict: Keeping the Model Inside the Evidence

The Job of This Stage

Restrict is the instruction that confines the model to the supplied context and permits it to decline when the answer is absent. It is the guardrail that stops the model from blending in training knowledge.

The Failure to Watch

Skipping Restrict lets the model fabricate fluently, presenting guesses as sourced facts. The fix is one or two explicit sentences. This guardrail is the corrective for the most damaging mistake in 7 Common Mistakes with Grounding Prompts with Retrieved Context.

Cite: Making Every Claim Traceable

The Job of This Stage

Cite requires the model to attribute each claim to the specific chunk that supports it. Citation turns answers from opaque assertions into verifiable statements.

The Failure to Watch

Omitting Cite hides fabrication, because an invented claim looks identical to a sourced one in fluent prose. With Cite in place, a claim with no matching source stands out immediately. The trust this builds is explored in Grounding Prompts with Retrieved Context: Best Practices That Actually Work.

Evaluate: Measuring Whether the System Actually Works

The Job of This Stage

Evaluate runs a standing set of real questions with known answers after every change, converting impressions into measurements. It is the stage that tells you whether a tweak helped.

The Failure to Watch

Evaluating on a single happy example breeds false confidence that collapses on real traffic. Build a varied test set and change one variable at a time so each result is attributable.

Applying the Whole Model

Diagnose by Stage

When a grounded answer is wrong, walk the stages in order. Did Select return the right chunks? Did Organize give it good material to work with? Did Unite arrange them well? Was Restrict in place? Did Cite expose anything? Did Evaluate catch the regression? The first stage that fails is your fix.

Improve One Stage at a Time

Because the stages are distinct, you can strengthen them independently. Improving Select rarely requires touching Restrict. This separation is what makes the model a working tool rather than a slogan, and it keeps your tuning disciplined.

Where the Stages Interact

Upstream Stages Constrain Downstream Ones

The stages are separable for diagnosis but not independent in effect. A weak Organize stage caps how good Select can ever be, because retrieval cannot find a coherent passage that chunking never produced. Likewise, no amount of careful Unite or Restrict can rescue an answer when Select handed over the wrong material. This is why diagnosis walks from the earliest stage forward: an upstream failure makes everything after it look broken, and fixing a downstream stage while an upstream one is failing wastes effort.

Restrict and Cite Reinforce Each Other

Restrict and Cite are technically separate, but they compound. Restrict tells the model to stay within the evidence; Cite forces it to show which evidence it used. Together they create a feedback loop you can inspect: if a cited claim does not actually appear in the cited chunk, you have caught a Restrict violation that Cite made visible. Neither stage alone gives you that. Run them as a pair and you gain a self-checking property that pure instruction cannot provide.

Evaluate Watches the Whole Chain

Evaluate is the only stage that sees the end-to-end result, which makes it your detector for problems that no single upstream stage reveals. A regression that emerges only from the interaction of two stages, a chunking change that subtly degrades retrieval, for instance, surfaces in Evaluate before it surfaces anywhere else. Treat the standing test set as the integration test for the entire SOURCE pipeline, not merely a check on the model's wording.

Frequently Asked Questions

Do the stages have to run in this exact order?

The order reflects dependency: Organize enables Select, Select feeds Unite, and so on. You build them in roughly this order, but at run time Select through Cite happen together for each question, with Organize and Evaluate as surrounding activities.

Where do most failures land in the SOURCE model?

In Select, by a wide margin. Retrieval returning the wrong passages is the most common root cause, which is why the model puts it first and why inspecting it is the first diagnostic step.

Is this framework tied to any particular tools?

No. SOURCE describes the work, not the technology. It applies whether you use keyword search or vector retrieval, a hosted model or a local one. The stages stay the same as tools change.

How is Restrict different from Cite?

Restrict keeps the model inside the supplied context; Cite makes its use of that context traceable. Restrict prevents fabrication, Cite reveals it. You want both, because each catches what the other misses.

Key Takeaways

  • SOURCE breaks grounding into six nameable stages: Select, Organize, Unite, Restrict, Cite, and Evaluate.
  • Each stage has a distinct job and a characteristic failure, so problems get an address instead of vague blame.
  • Select, retrieval quality, is where most failures originate and where diagnosis should begin.
  • Restrict and Cite work as a pair: one keeps the model in the evidence, the other makes its use traceable.
  • Diagnose by walking the stages in order and improve them one at a time for disciplined, attributable progress.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification