AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

A Support Assistant That Cited the Wrong PolicyWhat Was HappeningThe LessonA Drafting Tool That Lost the Brand VoiceWhat Was HappeningThe LessonA Chatbot That Forgot the Customer Mid-ConversationWhat Was HappeningThe LessonA Research Helper That Compounded Its Own ErrorWhat Was HappeningThe LessonA Search Feature That Buried Its Best InstructionWhat Was HappeningThe LessonA Classifier That Improved by Including LessWhat Was HappeningThe LessonA Knowledge Bot That Answered Beyond Its SourcesWhat Was HappeningThe LessonA Report Generator That Ran Out of RoomWhat Was HappeningThe LessonFrequently Asked QuestionsWhat ties all these examples together?How do I diagnose which problem I have?Was the model ever genuinely the problem in these cases?Can including less context really improve accuracy?Where can I see a fuller end-to-end account?Key Takeaways
Home/Blog/Five Scenarios Where Context Choices Made or Broke It
General

Five Scenarios Where Context Choices Made or Broke It

A

Agency Script Editorial

Editorial Team

·October 10, 2023·7 min read
context engineeringcontext engineering examplescontext engineering guideprompt engineering

Principles are easier to remember when you have seen them play out. This article walks through concrete context engineering scenarios across different kinds of AI features, showing the specific decision that made each one work or fail. None of these are exotic; they are the everyday situations teams hit when moving an AI feature from a clever demo to something that holds up.

For each scenario you will see the setup, what went wrong or right, and the underlying lesson. The scenarios are illustrative rather than tied to any single organization, but the patterns are real and recurring. Read them as a way to recognize your own situation in someone else's.

The thread connecting all of them is the same: the model's behavior was determined less by its raw capability than by what it could see at the moment it answered. Change the context, change the outcome. As you read, notice how often the team's first explanation blamed the model and how often the real cause turned out to be a fixable property of the context. That gap between the assumed cause and the actual one is where most of the wasted effort in AI development lives.

A Support Assistant That Cited the Wrong Policy

A customer support assistant kept confidently quoting outdated refund rules.

What Was Happening

The retrieval step pulled policy documents by keyword match. Both the current and a superseded policy mentioned refunds, and the older one ranked higher, so the model grounded its answer on stale text.

The Lesson

Retrieval quality is the ceiling on answer quality. No prompt change would have fixed this, because the model was reasoning correctly over the wrong evidence. The fix was tagging documents with effective dates and filtering retrieval to current policies. The broader principle is covered in Master Context Engineering Without Guesswork.

A Drafting Tool That Lost the Brand Voice

A content tool produced accurate drafts that sounded nothing like the company.

What Was Happening

The system instruction said write in our brand voice with no definition. The model had no way to know what that voice was, so it defaulted to generic.

The Lesson

Instructions must be concrete. Replacing the abstraction with two short example paragraphs in the brand voice—a few-shot example placed in context—immediately aligned the output. Showing beat telling. More patterns like this appear in Context Engineering Habits That Hold Up in Production.

A Chatbot That Forgot the Customer Mid-Conversation

A multi-turn assistant kept asking for information the user had already given.

What Was Happening

The conversation appended every message verbatim. After enough turns, early messages—including the user's stated account and intent—fell out of the window, and the system instructions nearly went with them.

The Lesson

Conversation history needs active management. Introducing a running summary that preserved key facts while dropping verbatim chatter kept intent intact across long sessions. This failure mode is one of the most common, detailed in 7 Common Mistakes with Context Engineering.

A Research Helper That Compounded Its Own Error

A tool that summarized findings across steps drifted further from reality with each step.

What Was Happening

Each step fed its own output into the next step's context. One early hallucinated figure got treated as established fact and propagated through every subsequent summary.

The Lesson

This is context poisoning. The fix was validating extracted facts against source documents before passing them forward, and not letting unverified model output become authoritative context. Guarding what enters context matters as much as guarding what the model generates. The insidious part of this failure is how plausible the compounded errors looked: each step was internally consistent with the poisoned fact, so nothing seemed wrong until someone checked against the original source. Systems that feed their own output forward need a validation gate at every handoff, not just at the end.

A Search Feature That Buried Its Best Instruction

An internal search assistant ignored a rule it had clearly been given.

What Was Happening

The non-negotiable rule—answer only from the provided documents—sat in the middle of a long context, after a large block of retrieved text. The model attended to the surrounding evidence and effectively skipped the rule.

The Lesson

Position is not neutral. Moving the rule to the start of the system block and restating it just before the question restored compliance. The same words, repositioned, changed the behavior. The mechanics are explained in Build Reliable Context One Step at a Time.

A Classifier That Improved by Including Less

An email classifier got more accurate when the team removed context.

What Was Happening

To help the model, the team had included the full email thread, signatures, legal disclaimers, and prior classifications. The relevant signal—the latest message—was buried in noise.

The Lesson

More context is not better context. Trimming to just the latest message and a short label definition raised accuracy and cut cost. Restraint outperformed comprehensiveness, the opposite of the team's instinct.

A Knowledge Bot That Answered Beyond Its Sources

An internal knowledge assistant kept confidently answering questions its documents never covered.

What Was Happening

The system retrieved relevant documents but never instructed the model to limit itself to them. When a question fell outside the retrieved material, the model filled the gap from its general training, producing answers that sounded authoritative but were not grounded in approved sources.

The Lesson

Grounding requires an explicit boundary. Adding a concrete rule—answer only from the provided documents, and say you do not know if they do not cover it—stopped the unsupported answers. Retrieval gathers the evidence; an instruction is still needed to confine the model to it. The discipline behind that rule is covered in Context Engineering Habits That Hold Up in Production.

A Report Generator That Ran Out of Room

A tool that produced structured reports kept cutting off before finishing.

What Was Happening

The team packed so much reference material into the context that little budget remained for the model to write its output. The window was nearly full before generation even began.

The Lesson

Output competes for the same token budget as input. Reserving room for the answer—and compressing the reference material to make that room—let the reports complete. The fix was not a bigger model but a leaner context that respected the budget the response needed.

Frequently Asked Questions

What ties all these examples together?

In every case, the model's behavior was set by what it could see, not by its raw intelligence. Wrong evidence, vague instructions, lost history, poisoned facts, bad positioning, and excessive noise all produced bad answers from a capable model. Fixing the context fixed the output.

How do I diagnose which problem I have?

Read the exact context the model received for a failing case. The symptom usually points to the cause: stale answers suggest retrieval, off-tone output suggests vague instructions, forgotten facts suggest history management, and ignored rules suggest positioning. Inspection beats speculation every time.

Was the model ever genuinely the problem in these cases?

No. In each scenario the model reasoned correctly over the information it was given. The information was the problem. This is the typical pattern: failures attributed to model capability are far more often context gaps that supplying or trimming the right material resolves.

Can including less context really improve accuracy?

Yes, as the classifier example shows. Models weight everything in the window, including noise. When the relevant signal is buried under thread history, disclaimers, and boilerplate, trimming to the essential text often raises accuracy while also reducing cost on every call.

Where can I see a fuller end-to-end account?

For a single situation followed from problem through measured outcome, read Case Study: Context Engineering in Practice. It carries one scenario through diagnosis, decision, execution, and result rather than sampling many.

Key Takeaways

  • Stale answers usually trace to retrieval grounding on outdated documents
  • Vague instructions like brand voice fail; concrete examples fix them
  • Long conversations need running summaries to preserve intent
  • Feeding unverified output forward causes compounding context poisoning
  • A correctly worded rule still fails if positioned in a low-attention spot
  • Removing noisy context can raise accuracy and lower cost at the same time

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification