AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Keyword Search Versus Semantic SearchThe Axes That Separate ThemThe Decision RuleFew Passages Versus ManyThe Axes That Separate ThemThe Decision RuleGrounding Versus Fine-TuningThe Axes That Separate ThemThe Decision RuleSmall Chunks Versus Large ChunksThe Axes That Separate ThemThe Decision RuleA Middle Path Worth KnowingAutomated Evaluation Versus Human ReviewThe Axes That Separate ThemThe Decision RuleBuilding Your Own Decision FrameName Your Constraints FirstRevisit as Conditions ChangeOne-Shot Retrieval Versus Iterative RetrievalThe Axes That Separate ThemThe Decision RuleWatch the Compounding Failure SurfaceFrequently Asked QuestionsIs semantic search always better than keyword search?When does fine-tuning beat grounding?How do I decide how many passages to retrieve?Can I avoid the chunk size trade-off entirely?Key Takeaways
Home/Blog/Weighing the Real Choices in Retrieval-Backed Prompting
General

Weighing the Real Choices in Retrieval-Backed Prompting

A

Agency Script Editorial

Editorial Team

·October 27, 2022·8 min read
grounding prompts with retrieved contextgrounding prompts with retrieved context tradeoffsgrounding prompts with retrieved context guideprompt engineering

Every grounding decision is a trade-off in disguise. Should you use keyword search or semantic search? Retrieve many passages or few? Ground the model or fine-tune it instead? The right answer is almost never absolute; it depends on your documents, your questions, and what you are willing to pay in cost, latency, and effort. The teams that struggle are the ones chasing a single best configuration that does not exist. The teams that succeed learn the axes and decide deliberately.

This article lays out the major choices, the dimensions along which they differ, and a decision rule for each. The aim is not to tell you what to pick but to give you the reasoning to pick well, and to recognize when your situation has shifted enough to revisit an earlier call.

We will treat each decision as a pair of competing options and ask the same questions of each: what does it cost, what does it buy, and when does it win? Keep in mind that these axes interact. A choice to retrieve many passages, for instance, raises the stakes on chunk size, because large chunks plus many passages can overflow your prompt and your budget at once. Decide each axis with the others in view rather than in isolation, and revisit the whole set when any one of them changes materially.

Keyword Search Versus Semantic Search

The Axes That Separate Them

Keyword search matches exact words; semantic search matches meaning. Keyword is simple, transparent, and fast, and it shines when users ask using the same terms your documents use. Semantic handles paraphrase and synonyms but adds the complexity of embeddings and can return passages that feel related without containing the answer.

The Decision Rule

If your vocabulary is controlled and your users speak the documents' language, start with keyword search; it is cheaper and easier to debug. Reach for semantic search when questions vary widely in wording. Many mature systems combine both, and the tooling for each is surveyed in Choosing the Stack Behind Retrieval-Grounded Prompts.

Few Passages Versus Many

The Axes That Separate Them

Retrieving more passages raises the odds the answer is somewhere in the context, at the cost of diluting the model's attention, raising spend, and slowing responses. Retrieving fewer keeps the context sharp but risks missing the passage that mattered.

The Decision Rule

Default to few, the smallest set that reliably answers your test questions, and add more only when evaluation shows answers are missing detail. The instinct to pad context for safety usually backfires, as the failure modes in 7 Common Mistakes with Grounding Prompts with Retrieved Context demonstrate.

Grounding Versus Fine-Tuning

The Axes That Separate Them

Grounding supplies facts at query time and leaves the model untouched, so updates mean re-indexing. Fine-tuning bakes knowledge into the model's weights, which can make it more fluent in a domain but is expensive and slow to update, and it does not give you traceable sources.

The Decision Rule

When your knowledge changes often or you need to cite sources, ground. When you need the model to adopt a style or reason within a stable, rarely changing domain, fine-tuning may help. For keeping answers current and auditable, grounding wins in most situations, which is why the team in How One Support Team Cut Wrong Answers in Half chose it.

Small Chunks Versus Large Chunks

The Axes That Separate Them

Small chunks are precise and retrieve cleanly for narrow factual questions, but they can strip away the surrounding context that makes a passage meaningful. Large chunks preserve context and suit synthesis, but waste prompt space and dilute relevance for simple lookups.

The Decision Rule

Size chunks to your dominant question type: small for factual lookup, larger for synthesis. If your questions span both, store more than one granularity and select based on the query rather than forcing a single value.

A Middle Path Worth Knowing

There is a hybrid that sidesteps part of this tension: retrieve on small chunks for precision, then expand each match to include its surrounding context before handing it to the model. You get the clean retrieval of small chunks with the richer context of large ones, at the cost of a little extra machinery. It is not free, and it is not always worth it, but when your questions stubbornly span both lookup and synthesis, this approach often beats forcing a single chunk size.

Automated Evaluation Versus Human Review

The Axes That Separate Them

Automated scoring is fast and repeatable but blunt on nuance. Human review is accurate but slow and expensive, and it does not scale to every change.

The Decision Rule

Automate the regression checks you run constantly, and reserve human judgment for ambiguous or high-stakes answers. The blend gives you speed where you need volume and accuracy where you need care. Neither alone is sufficient.

Building Your Own Decision Frame

Name Your Constraints First

Before choosing anything, write down what you are optimizing for: currency of information, source traceability, cost, latency, and operational simplicity. The right choice on every axis above falls out of those priorities, and two teams with different priorities will rightly choose differently.

Revisit as Conditions Change

A decision that was correct when your corpus was small and stable may be wrong once it grows or starts changing weekly. Treat these trade-offs as living, and rerun the reasoning when your situation shifts.

One-Shot Retrieval Versus Iterative Retrieval

The Axes That Separate Them

Simple grounding retrieves once and answers. Iterative approaches let the model retrieve, read, then retrieve again based on what it learned, chasing down detail across several rounds. Iteration handles complex, multi-part questions that a single retrieval cannot satisfy, but it multiplies cost and latency and adds moving parts that can fail.

The Decision Rule

Start with one-shot retrieval; it is simpler, cheaper, and sufficient for the large majority of questions. Adopt iterative retrieval only when you can point to specific questions that genuinely require gathering evidence in stages, and when you have the evaluation in place to prove the added complexity pays off. Complexity you cannot measure is complexity you will regret.

Watch the Compounding Failure Surface

Every extra retrieval round is another place the wrong chunk can enter and another place latency accumulates. The more rounds you add, the more important auditable retrieval becomes, because a failure three rounds deep is far harder to trace than one in a single-shot pipeline. If you go iterative, invest first in logging every round.

Frequently Asked Questions

Is semantic search always better than keyword search?

No. Keyword search is simpler, cheaper, and easier to debug, and it works well when users phrase questions in the documents' own terms. Semantic search earns its complexity only when question wording varies widely.

When does fine-tuning beat grounding?

When you need the model to adopt a particular style or reason within a stable domain that rarely changes, and you do not need traceable sources. For current, citable answers over evolving content, grounding is the stronger default.

How do I decide how many passages to retrieve?

Let evaluation decide. Start with a small number, then increase only if your test questions show answers missing detail. Padding context preemptively lowers quality and raises cost more often than it helps.

Can I avoid the chunk size trade-off entirely?

Partly. Storing multiple granularities and selecting per query sidesteps a single forced value, at the cost of more storage and complexity. For most teams, sizing to the dominant question type is simpler and good enough.

Key Takeaways

  • Grounding decisions are trade-offs, not absolutes; the right choice depends on your documents, questions, and priorities.
  • Start with keyword search and few passages, adding semantic search and more context only when evaluation justifies it.
  • Choose grounding over fine-tuning when knowledge changes often or you need traceable sources.
  • Size chunks to your dominant question type, and consider storing multiple granularities when questions vary.
  • Name your constraints first, let the choices fall out of them, and revisit the reasoning as conditions change.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification