A playbook is different from a tutorial. A tutorial teaches you the technique once; a playbook tells you which technique to run, when to run it, who owns it, and what to do when it fails. Prompt compression needs the playbook treatment because the technique is the easy part. The hard part is knowing which prompt to compress this week, how aggressively, and how to roll it out without breaking production.
What follows is an operating manual organized as plays. Each play has a trigger that tells you when to run it, a sequence of moves, and an owner who is accountable for the outcome. You can run the whole sequence end to end on a new project, or pull individual plays as situations arise. The plays are ordered the way they usually need to happen, but the triggers, not the order, decide what you run.
Treat this as a working reference rather than a one-time read. The value is in returning to it when a specific trigger fires.
Play One: Establish the Baseline
Trigger: any time before you compress a prompt for the first time, or whenever you lack a current quality measurement.
The moves
- Assemble a representative evaluation set with known-good outputs, including edge cases.
- Run it against the current prompt and record accuracy and token count.
- Store both numbers as the baseline that every future change is measured against.
Owner: whoever owns the prompt's quality. Without this play, every other play is guesswork.
Play Two: Harvest the Obvious Bloat
Trigger: a prompt is high-volume and has never been compressed.
The moves
- Remove conversational filler, redundant restatements, and instructions repeated several ways.
- Convert verbose examples into concise ones, or drop examples a one-line description can replace.
- Re-run the evaluation set and confirm accuracy holds.
This play is low-risk and usually recovers the largest single chunk of savings. Owner: the prompt's engineer. The team-level version of this work appears in Rolling Out Leaner Prompts Without Breaking Your Team.
Play Three: Surgical Compression of the Core
Trigger: obvious bloat is gone but token count is still high relative to volume.
The moves
- Identify the load-bearing instructions by removing each and measuring the effect.
- Keep anything whose removal moves accuracy, especially on edge cases.
- Tighten phrasing of what remains without changing meaning.
This is the riskiest play and the slowest. Move one change at a time so you can attribute any regression precisely. The risk profile is detailed in When Shrinking Prompts Quietly Degrades Your Output.
Play Four: Stage the Rollout
Trigger: a meaningful compression has passed the evaluation set and is headed to production.
The moves
- Deploy behind a flag or to a fraction of traffic first.
- Watch production quality and latency metrics for the regressions your evaluation set may have missed.
- Promote to full traffic only after the staged segment holds steady.
Owner: the engineer plus whoever monitors production. Staging converts a silent failure into a contained one.
Play Five: Lock In a Fallback
Trigger: the compressed prompt serves a high-stakes path.
The moves
- Keep the previous verbose prompt documented and reachable behind a flag.
- Verify you can revert in a single change.
- Note the revert procedure where on-call can find it.
The shelf cost of a fallback is zero, and it turns a quality incident into a one-line rollback rather than an emergency rewrite.
Play Six: Guard Against Drift
Trigger: standing, runs continuously once a prompt is in production.
The moves
- Add a token-count check in continuous integration that flags growth past the expected size.
- Schedule a quarterly audit that samples production prompts against your standard.
- Re-validate compressed prompts whenever you change models.
Owner: a designated prompt steward. Drift is the failure mode that quietly reverses every other play, so this one never stops running. The repeatable process behind it is in Turning Prompt Trimming Into a Repeatable, Hand-Off-Able Process.
Play Seven: Measure the Program, Not Just the Prompt
Trigger: standing, reviewed monthly once you have several compressed prompts in production.
The moves
- Track aggregate token spend across all compressed prompts and its trend over time.
- Track adoption: how many production prompts have been through the pipeline versus how many remain untouched.
- Track quality incidents attributable to compression, which should stay near zero if the earlier plays are working.
Why it matters
Individual prompt wins can hide a program that is quietly stalling. If adoption flattens while a handful of prompts carry all the savings, you have a fragile situation that depends on one or two people. Measuring at the program level surfaces that before it becomes a problem. The adoption signals worth watching are detailed in Rolling Out Leaner Prompts Without Breaking Your Team.
Owner: the prompt steward, reporting to whoever owns the cost or reliability budget.
Anti-Plays: What Not to Do
Knowing the moves to avoid is as useful as knowing the moves to run.
Compressing without a baseline
Cutting tokens before you have measured current quality is the cardinal anti-play. You will save money and have no idea whether you broke anything. Every play above depends on play one for a reason.
Optimizing low-volume prompts
Spending a careful afternoon on a prompt that runs ten times a day is effort that should have gone to a high-volume prompt instead. Let the intake ranking, not curiosity, decide what you work on.
Treating safety constraints as fair game
Guardrail language is the one category to exclude from compression entirely unless you can prove the behavior holds without it. The few tokens saved are never worth a compliance failure. The deeper risk analysis is in When Shrinking Prompts Quietly Degrades Your Output.
Sequencing the Plays
The plays form a pipeline for a new prompt and a maintenance loop for an existing one.
For a new prompt
Run one through five in order: baseline, harvest, surgical, stage, fallback. Then hand off to play six, which runs forever.
For an existing prompt
Start with the trigger that fired. A latency complaint sends you to play two or three. A model upgrade sends you straight to play six's re-validation step. Let the trigger, not the sequence number, decide your entry point.
Frequently Asked Questions
Do I have to run every play on every prompt?
No. The triggers decide. A low-volume prompt may never get past play one, and that is correct. The plays exist so that when a trigger fires you know exactly what to do, not so that you run all of them mechanically on everything.
Who should own the playbook overall?
A single prompt steward or small guild should own the standard and the drift-guarding play, while individual engineers own the compression of their own prompts. Splitting ownership this way keeps the system maintained without making one person a bottleneck for every change.
How long does running the full pipeline take for one prompt?
The obvious-bloat play is often an afternoon. The surgical play can take days because each cut needs its own measurement. Staging adds calendar time but little effort. Budget the bulk of your time for play three, which is where the careful, slow work lives.
What triggers a re-run on an already-compressed prompt?
A model change, a noticeable shift in your input distribution, a latency or cost complaint, or a failed drift check. Any of these sends you back into the pipeline at the relevant play rather than starting over from scratch.
How do I keep the playbook itself from going stale?
Review it whenever a play repeatedly fails to catch a problem, or when a new model class changes what safe compression looks like. The playbook is a living document; update it from the incidents it misses.
Key Takeaways
- A playbook tells you which compression move to run, when, and who owns it, not just how.
- Always establish a measured baseline before compressing anything.
- Harvest obvious bloat first, then move to slow, surgical, one-change-at-a-time compression.
- Stage rollouts and keep a verbose fallback so failures stay contained and reversible.
- Run the drift-guarding play continuously, because drift quietly reverses every other gain.