AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Default to Grounding, Treat Memory as a Last ResortWhy this is the right defaultThe trade-offMake Abstention a First-Class OutcomeWhy this mattersThe trade-offRequire Evidence, Then Use the Absence of ItWhy this worksThe trade-offSeparate Generation From VerificationWhy the split mattersThe trade-offConstrain the Output, Not Just the InputWhy structure helpsThe trade-offTest Where Fabrication Actually LivesWhy this is non-negotiableThe trade-offSequencing the PracticesGrounding comes before everythingAbstention and structure come nextVerification and testing close the loopFrequently Asked QuestionsWhat is the single most important practice?Are these practices model-specific?How do I balance abstention against usefulness?Can required citations be trusted completely?When is a verification pass not worth it?Key Takeaways
Home/Blog/Hard-Won Rules for Keeping AI Answers Grounded
General

Hard-Won Rules for Keeping AI Answers Grounded

A

Agency Script Editorial

Editorial Team

·December 27, 2023·7 min read
reducing hallucinations through promptingreducing hallucinations through prompting best practicesreducing hallucinations through prompting guideprompt engineering

There is no shortage of generic advice about reducing hallucinations. "Be specific." "Provide context." Most of it is true and useless, because it does not tell you what to actually do or why it works. This article takes the opposite stance: a set of opinionated practices, each with the reasoning that earns it a place and the trade-off it carries.

These are not rules to follow blindly. They are positions arrived at by watching prompts fail in production and figuring out what reliably fixed them. Where a practice has a downside, we say so, because a practice you apply without understanding its cost is a practice you will misapply.

If you want the foundational concepts first, start with Stop Your Model From Inventing Facts at the Prompt Layer. If you already know the basics, read on.

Default to Grounding, Treat Memory as a Last Resort

The strongest practice is to assume the model's memory is unreliable for any specific fact and design around that assumption.

Why this is the right default

Parametric memory is lossy, dated, and confidently wrong on specifics. Every factual answer drawn from memory is a guess wearing the costume of a fact. Grounding the model in supplied text converts the guess into a lookup.

The trade-off

Grounding requires you to have and supply the source material, which adds retrieval infrastructure and prompt length. For tasks where no source exists, you fall back to memory and accept higher risk—so reserve those tasks for low-stakes use.

Make Abstention a First-Class Outcome

Treat "I do not know" as a valid, desirable answer, not a failure state to be engineered away.

Why this matters

A model with no exit answers everything, including questions it cannot support, and fills gaps with invention. Granting explicit, concrete permission to abstain is one of the highest-leverage single lines you can add to a prompt.

The trade-off

Push abstention too hard and the model refuses questions it could have answered, frustrating users. The practice is to calibrate, measuring unnecessary refusals alongside fabrications, not to maximize abstention. This balance is covered in Build a Fabrication-Resistant Prompt in Eight Moves.

Require Evidence, Then Use the Absence of It

Demand that every claim cite the source passage that supports it, and treat unsupported claims as signals.

Why this works

A citation requirement forces a self-check before the model commits. When a claim has no supporting passage, the gap becomes visible, and a well-prompted model abstains rather than fabricating to fill it.

The trade-off

Models can fabricate citations too, quoting passages that do not actually support the claim. Evidence requirements reduce fabrication but do not eliminate the need for verification, especially on high-stakes output.

Separate Generation From Verification

Do not trust a single prompt to both produce an answer and confirm it is correct.

Why the split matters

A model evaluating its own fresh output tends to rationalize rather than scrutinize. A separate verification pass, framed independently, catches errors the generation step approved. Two passes are meaningfully more reliable than one.

The trade-off

A second pass doubles the cost and latency of each answer. Reserve it for tasks where a confident wrong answer causes real harm, and skip it where errors are cheap.

Constrain the Output, Not Just the Input

A tight output structure does as much to suppress fabrication as a careful input.

Why structure helps

Open-ended prose gives the model room to embellish. Defined fields, bounded lists, and required slots channel the model into the shape you want and starve the freelancing impulse. Length correlates with invention, so shorter, structured output drifts less.

The trade-off

Over-constraining can force the model to produce output it cannot support, jamming a guess into a required field. Pair structure with abstention so empty slots are allowed to stay empty.

Test Where Fabrication Actually Lives

Build evaluation around the questions your source cannot answer, because that is where hallucination shows up.

Why this is non-negotiable

A prompt that handles answerable questions tells you nothing about its fabrication rate. The risk is concentrated in unanswerable cases, and a prompt that abstains on all of them is doing its job.

The trade-off

Assembling and maintaining a labeled test set with unanswerable cases takes effort that feels unglamorous. But without it, every prompt change is a guess, and you will routinely fix one failure while creating another.

Sequencing the Practices

The practices above are not a menu to pick from at random. They have a natural order, and applying them out of sequence wastes effort.

Grounding comes before everything

There is no point requiring evidence or running verification if the model has no source to ground in. Secure the source material first, restrict the model to it, and only then layer on the practices that depend on that foundation. A team that adds verification before fixing grounding is polishing a guess.

Abstention and structure come next

Once grounded, add the abstention clause and constrain the output shape. These two work together: structure tells the model where to put answers, and abstention tells it that empty is an acceptable value. Apply them as a pair, because structure without an exit forces the model to jam a guess into a required slot.

Verification and testing close the loop

Evidence requirements and a verification pass are the last line, reserved for stakes that justify their cost. Testing on unanswerable questions wraps around all of it, because every other practice is unproven until you have measured it against the cases where fabrication lives. The sequence, end to end, mirrors the staged build in Build a Fabrication-Resistant Prompt in Eight Moves.

Frequently Asked Questions

What is the single most important practice?

Defaulting to grounding—treating the model's memory as unreliable and supplying source material for any specific fact. It addresses the root cause of most fabrication and converts a guess into a lookup. Everything else builds on top of it.

Are these practices model-specific?

No. They target how generation works, which is common across models. A newer or larger model may hallucinate somewhat less, but grounding, abstention, evidence requirements, and verification improve results on any model. They are portable, which is part of why they are worth investing in.

How do I balance abstention against usefulness?

Measure both. Track fabrications and unnecessary abstentions on a test set, and tune the abstention clause until both are low. The target is calibration—answering when the source supports it, abstaining when it does not—rather than maximizing either accuracy or caution in isolation.

Can required citations be trusted completely?

No. Models can fabricate citations or quote passages that do not actually support the claim. Citations sharply reduce fabrication and surface gaps, but on high-stakes output you still want a separate verification pass to confirm the cited source genuinely supports the answer.

When is a verification pass not worth it?

When errors are cheap and the task is low-stakes. The second pass roughly doubles cost and latency, so for casual or easily corrected output it is overkill. Reserve it for cases where a confident wrong answer creates real risk or liability.

Key Takeaways

  • Default to grounding and treat the model's memory as an unreliable last resort for any specific fact.
  • Make abstention a first-class, desirable outcome, then calibrate it so the model does not refuse answerable questions.
  • Require evidence for every claim and treat unsupported claims as a signal to abstain, while remembering citations can be faked.
  • Separate generation from verification for high-stakes tasks, accepting the added cost and latency.
  • Build your testing around unanswerable questions, where fabrication actually lives, and tune toward calibration rather than any single extreme.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification