AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

PrerequisitesWhat to have readyWhat you do not needMove One: Confirm It Is a Boundary ProblemThe testMove Two: Name the Distinguishing FeatureWhy this is the hard partMove Three: Write One Clean PairThe construction ruleAdd the reasoningMove Four: Validate Against a Small Fixed SetThe minimal evaluationWhen to stopA Concrete First AttemptThe setupWorking the pairCommon First-Timer MistakesTreating the label as the lessonBuilding the pair before naming the featureFrequently Asked QuestionsHow long should my first contrastive pair take to build?What if I cannot find real examples of the model's mistake?Do I really need a validation set for my first attempt?My pair did not help. What went wrong?When should I add a second pair?Key Takeaways
Home/Blog/Building a Disambiguation Prompt From One Clean Pair
General

Building a Disambiguation Prompt From One Clean Pair

A

Agency Script Editorial

Editorial Team

·March 3, 2020·7 min read
contrastive prompting for disambiguationcontrastive prompting for disambiguation getting startedcontrastive prompting for disambiguation guideprompt engineering

If a model keeps misreading the same kind of input, you do not need a research project to fix it. You need one well-chosen contrastive pair and a way to confirm it worked. This article is the shortest credible path from a confused model to a working disambiguation prompt, written for someone who has never built one. It will not cover every edge case; it will get you a first real result and the habits to build on.

The work breaks into four moves: confirm you actually have a boundary problem, name the distinguishing feature, write a single clean pair, and validate it against a small fixed set. Each move has a clear stopping point, so you always know whether you are ready for the next one. Skipping a move is the usual way first attempts fail.

Before any of that, a word on expectations. A first contrastive pair often improves a boundary dramatically, but only if the pair is clean. The single biggest mistake beginners make is picking two examples that differ on several things at once, which teaches the model the wrong lesson. Most of this guide is about avoiding that.

It also helps to know what success looks like so you do not over-build. A first attempt is not meant to handle every edge case or cover every confusing input you can imagine. It is meant to move one specific, recurring boundary in the right direction and to prove, with a small fixed comparison, that it did. If you finish with one clean pair that demonstrably improved the boundary without breaking anything else, you have succeeded, even if a few rare cases still slip through. Chasing completeness on the first pass is how beginners turn an afternoon's work into a week's.

Prerequisites

You need very little to start, but the little you need is non-negotiable.

What to have ready

  • Access to the prompt you want to fix and the ability to edit and test it.
  • A handful of real inputs where the model produced the wrong reading. Real failures, not imagined ones.
  • A way to label correct outputs by hand for a small set. A spreadsheet is enough.

What you do not need

You do not need a fine-tune, a vector database, or a prompt platform. Those come later, if ever, the staging described in What Tooling Earns Its Place in a Disambiguation Workflow.

Move One: Confirm It Is a Boundary Problem

Not every error is a disambiguation problem.

The test

Look at the failures. Do they cluster on a specific confusable pair — two outputs that keep getting swapped? If yes, a contrastive pair is the right tool. If the errors are scattered randomly, the issue is probably clarity or capability, and you should fix the instruction first, the decision laid out in When a Clearer Instruction Beats a Contrastive Pair.

Move Two: Name the Distinguishing Feature

Write one sentence that captures what separates the two confused outputs.

Why this is the hard part

If you cannot name the feature, you cannot teach it, and any examples you pick will vary on accidental dimensions. Spend real time here. "An existing matter implies prior engagement; a new matter does not" is the kind of sentence you are after — a feature, not a label.

Move Three: Write One Clean Pair

Now build the pair, and keep it surgical.

The construction rule

Pick two examples that differ only on the feature from Move Two. Match their length, topic, and wording as closely as you can, so the distinguishing feature is the only thing that changes. This single discipline prevents the most common failure, detailed in Worked Cases Where Contrastive Pairs Helped or Hurt.

Add the reasoning

Attach a one-line justification to each example that names the deciding feature:

  • Wrong reading, with a note on why the model's instinct was tempting but incorrect.
  • Right reading, with a note naming the feature that settles it.

The justification is the part that teaches; do not omit it.

Move Four: Validate Against a Small Fixed Set

Confirm the pair helped before you trust it.

The minimal evaluation

Take ten to thirty hand-labeled examples, including the confusable boundary, and freeze them. Run the prompt before and after your change against the same set. Check that the boundary improved and that nothing you were not targeting got worse. This is the seed of the practice in Reading Whether Your Disambiguation Pair Actually Worked.

When to stop

If the boundary improved and nothing else regressed, you are done. Resist adding more pairs out of momentum; the first pair usually does most of the work, and extras cost tokens for little gain.

A Concrete First Attempt

Walking through one example makes the four moves feel less abstract.

The setup

Imagine a prompt that sorts inbound emails into "sales lead" and "support request," and it keeps tagging frustrated existing customers as leads because they mention wanting to "buy more" while complaining. The failures cluster on exactly this pattern, so Move One passes: it is a boundary problem, not random noise.

Working the pair

For Move Two you write the feature: a sales lead has no existing relationship and is exploring a purchase, while a support request comes from a current customer who needs something resolved, even if they mention buying more. For Move Three you pick two emails that are similar in tone and length, one from a genuine prospect and one from an annoyed customer, differing only on whether a prior relationship exists. Each carries a note naming that feature. For Move Four you run twenty labeled emails before and after, watch the lead-versus-support boundary improve, and confirm the other categories held. One pair, one afternoon, a real result.

Common First-Timer Mistakes

A few errors recur often enough to flag before you start.

Treating the label as the lesson

Beginners often write "this is a support request" as the justification. That restates the answer without teaching the axis. Name the feature — "because the sender is an existing customer" — so the model learns the distinction rather than memorizing the example.

Building the pair before naming the feature

If you skip Move Two and jump to picking examples, you almost always pick a pair that differs on several things, and the model learns the wrong one. Naming the feature first is what keeps the pair clean. Do not let eagerness collapse the order of the moves.

Frequently Asked Questions

How long should my first contrastive pair take to build?

Often under an hour for the pair itself, plus the time to hand-label a small validation set. The labeling, not the writing, is usually the bulk of the effort the first time, and that set is reusable afterward.

What if I cannot find real examples of the model's mistake?

Then you may not have a problem worth fixing yet. Contrastive pairs target real, recurring errors. If the mistake is rare or hypothetical, spend your effort where the model actually fails.

Do I really need a validation set for my first attempt?

Yes, even a tiny one. Without a fixed before-and-after comparison you are guessing whether the pair helped, and a pair can improve one boundary while quietly breaking another. Ten to thirty labeled examples is enough to start.

My pair did not help. What went wrong?

The most likely cause is that your two examples differed on more than the target feature, so the model learned the wrong distinction. Rebuild the pair to vary on exactly one dimension and try again.

When should I add a second pair?

Only when validation shows a remaining failure mode the first pair did not cover. Adding pairs preemptively wastes tokens. Let the measured failures tell you what the next pair should target.

Key Takeaways

  • You need only an editable prompt, a few real failure examples, and a way to hand-label a small set to start.
  • First confirm the errors cluster on a specific confusable boundary; scattered errors call for an instruction fix instead.
  • Name the single distinguishing feature in one sentence before choosing any examples.
  • Build one pair that varies on exactly that feature, with a justification on each example, and validate against a small fixed set.
  • The first clean pair usually does most of the work; add a second only when measured failures demand it.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification