AGENCYSCRIPT
CoursesEnterpriseBlog
👑FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Step 1: Build the BaselineDo thisStep 2: Find the FatDo thisStep 3: Make One CutDo thisStep 4: Re-Measure Against the BaselineDo thisStep 5: Repeat Until the Returns ShrinkDo thisStep 6: Lock and Document the ResultDo thisA Worked Pass Through the StepsThe starting pointWalking the stepsFrequently Asked QuestionsHow many test inputs do I really need?Why only one cut at a time?When should I stop compressing?What if every section seems load-bearing?Key Takeaways
Home/Blog/Shrink a Prompt in Six Measured Steps You Can Run Today
General

Shrink a Prompt in Six Measured Steps You Can Run Today

A

Agency Script Editorial

Editorial Team

·May 17, 2022·7 min read
prompt compression techniquesprompt compression techniques how toprompt compression techniques guideprompt engineering

Knowing the theory of prompt compression and actually shrinking a prompt are different skills. This is the second one. What follows is a sequential process you can run today on a prompt you already have in production, with each step defined concretely enough to follow without interpretation. No abstractions about token efficiency—just do this, then that, and measure as you go.

The process is deliberately conservative. It compresses one thing at a time and checks quality after every change, because the fastest way to ruin a working prompt is to cut several things at once and lose track of which cut did the damage. Follow the steps in order. Each builds on the artifact the previous step produced.

You will need a prompt to compress and a small set of representative inputs to test it against. Gather those before step one. The representative inputs matter more than people expect: they are the only thing standing between a safe cut and a silent regression, so spend a few minutes choosing inputs that cover both your common cases and the awkward edge cases where hidden constraints live. A test set that only contains easy inputs will bless cuts that quietly break the hard ones.

Step 1: Build the Baseline

You cannot tell whether compression hurt quality without knowing what good looked like beforehand.

Do this

  • Pick five to ten inputs that represent the real range of what the prompt handles.
  • Run the current prompt on each and save the outputs.
  • Note the token count of the prompt and roughly how good each output is.

This baseline is the reference every later step compares against. Skipping it means flying blind, which is the root of most compression failures.

Step 2: Find the Fat

Before cutting, locate where the tokens actually are.

Do this

  • Read the prompt and mark each section by purpose: instructions, examples, context, background.
  • Estimate the token weight of each section.
  • Flag anything repeated, anything that reads as filler, and any context that may be irrelevant.

The largest sections that carry the least task-specific information are your first targets. Often a long system preamble or a padded context block dominates the token count, and those are where the easy wins hide.

Step 3: Make One Cut

Now compress, but only one thing.

Do this

  • Choose a single flagged section.
  • Apply one technique: remove filler, tighten instructions into bullets, or drop an irrelevant passage.
  • Leave everything else untouched.

Resisting the urge to fix five things at once is the discipline that makes this process work. One change means you can attribute any quality shift to exactly that change. This is the same single-variable rule explained in Saying More to a Model With Fewer Tokens.

Step 4: Re-Measure Against the Baseline

This is the step that turns guessing into knowing.

Do this

  • Run the compressed prompt on the same five-to-ten inputs from step one.
  • Compare each output to its baseline counterpart.
  • Record the new token count and whether quality held, improved, or dropped.

If quality held or improved, keep the cut—you compressed. If it dropped, the section you cut carried signal, not filler. Revert it and try a different section. This compare-and-keep loop is what separates compression from accidental deletion, the failure cataloged in 7 Common Mistakes with Prompt Compression Techniques (and How to Avoid Them).

Step 5: Repeat Until the Returns Shrink

One cut is rarely the whole opportunity.

Do this

  • Return to step three and compress the next flagged section.
  • Re-measure each time.
  • Stop when remaining cuts either threaten quality or save too few tokens to matter.

Compression has diminishing returns. The first two or three cuts usually reclaim most of the available tokens; chasing the last few percent often risks more quality than it saves. Knowing when to stop is part of the skill.

Step 6: Lock and Document the Result

A compressed prompt that nobody records will drift back to bloat.

Do this

  • Save the final prompt in version control, not in a chat history.
  • Note the total tokens saved and confirm the quality baseline still holds.
  • Write down which sections you compressed and which you deliberately left alone.

Documenting what you left alone matters as much as what you cut, because it tells the next person which sections are load-bearing. A one-line note saying "this looks verbose but is required to trigger escalation" can save a future editor from re-introducing a regression you already found and reverted. For a sense of how much real prompts compress, walk through Case Study: Prompt Compression Techniques in Practice.

One last habit worth building: schedule a re-check after any model upgrade. A compression validated against today's model is not guaranteed safe against tomorrow's, because an update can change which tersely-phrased instructions the model still follows. Re-running your baseline test set after an upgrade is cheap insurance against a prompt that was lean and correct silently becoming lean and wrong.

A Worked Pass Through the Steps

To make the process concrete, here is what a single pass looks like on a realistic prompt, so you can picture each step before running your own.

The starting point

Imagine a system prompt with four parts: a polite preamble, a long block of tone guidance, a list of task rules, and a set of three worked examples. It runs on every request, so every token counts repeatedly.

Walking the steps

  • Baseline (Step 1): You run the prompt on eight representative inputs and save the outputs. The prompt is, say, 600 tokens.
  • Find the fat (Step 2): You mark the preamble as pure filler, the tone block as verbose but possibly load-bearing, the rules as essential, and the three examples as more than the task needs.
  • First cut (Step 3): You delete the preamble only, leaving everything else alone.
  • Re-measure (Step 4): The eight outputs are unchanged in quality, and the prompt is now smaller. You keep the cut.
  • Repeat (Step 5): Next pass, you reduce three examples to one. Quality still holds. The pass after, you tighten the tone block into bullets—and one output gets slightly worse, so you revert that part and keep only the safe portion of the tightening.
  • Lock (Step 6): You stop when remaining cuts threaten quality, save the result in version control, and note that the rules and the surviving example are load-bearing.

The point of walking through it is to show that the process is unglamorous on purpose. Each step is small, each result is measured, and the safety comes from never changing more than one thing between measurements. That is the entire trick—there is no clever shortcut that beats measuring.

Frequently Asked Questions

How many test inputs do I really need?

Five to ten that genuinely represent the range of real usage. The goal is not statistical rigor but enough coverage to notice if a cut breaks a common case. Too few and you miss regressions; far more and the loop gets slow without adding much confidence.

Why only one cut at a time?

Because if you change several things and quality drops, you cannot tell which change caused it. One cut per measurement keeps every result attributable, so you keep the good cuts and revert only the harmful one instead of throwing away the whole batch.

When should I stop compressing?

When the next cut either threatens quality or saves too few tokens to justify the risk. The first few cuts usually capture most of the savings, and chasing the last percent tends to cost more in quality than it returns in tokens.

What if every section seems load-bearing?

Then the prompt may already be efficient, which is a fine outcome. More often, the load-bearing sections can still be tightened in wording—turned into bullets, stripped of filler—without removing any actual requirement. Tighten the phrasing before concluding there is nothing to cut.

Key Takeaways

  • Build a quality baseline on representative inputs before changing anything—it is the reference for every later step.
  • Locate the fat by mapping each section's purpose and token weight before cutting.
  • Make one cut at a time so any quality change is attributable to that single change.
  • Re-measure after each cut and keep it only if quality holds; revert if it drops.
  • Stop when returns shrink, then lock and document the result in version control, including what you left alone.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification