AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Separate the Author From the AttackerWhy the Author Is the Worst TesterMake Adversarial Review a Role, Not a MoodWrite Boundaries Before You Write TestsUntested Boundaries Are Just HopesMake Boundaries Specific Enough to FailMaintain a Living Attack InventoryOne-Off Testing Decays ImmediatelyGrow It From Real TrafficPrioritize by Damage, Not by ClevernessNot All Failures Are EqualMatch Test Intensity to StakesFix Surgically and Re-Test RelentlesslyIsolate Every ChangeRe-Test the Whole Set, Not the One InputKnow When the Prompt Is the Wrong LayerSome Problems Cannot Be Prompted AwayDesign Defense in DepthMake the Safe Path the Easy PathFrequently Asked QuestionsWhich practice matters most if I can only adopt one?How do I justify the time these practices take?Is defense in depth admitting the prompt failed?How big should my attack inventory be?Can these practices be automated?Key Takeaways
Home/Blog/Habits That Keep a Production Prompt From Caving In
General

Habits That Keep a Production Prompt From Caving In

A

Agency Script Editorial

Editorial Team

·November 12, 2019·8 min read
adversarial prompt stress testingadversarial prompt stress testing best practicesadversarial prompt stress testing guideprompt engineering

Best-practice lists tend to dissolve into platitudes: "test thoroughly," "think like an attacker," "iterate." Nobody disagrees, and nobody can act on it. The practices below are different. Each one is a specific, sometimes uncomfortable choice, paired with the reasoning that justifies it. Some will contradict how your team works today. That contradiction is where the value is.

These practices come from a simple observation: prompts that survive contact with real users share a small set of habits, and prompts that fail in production usually violate several of them at once. The habits are not about working harder. They are about working in an order and with a discipline that prevents the most common collapses.

Treat this as a set of defaults to adopt deliberately, not a checklist to skim. Where a practice feels like overkill for your stakes, scale it down on purpose rather than skipping it by accident.

Underneath all of them sits one conviction: a prompt is not a piece of clever writing, it is a component that will be operated by strangers under conditions you do not control. Once you accept that framing, these practices stop feeling like extra process and start feeling like the minimum care any production component deserves. The teams that internalize this ship prompts that hold; the teams that treat prompts as throwaway text ship prompts that get screenshotted misbehaving.

Separate the Author From the Attacker

Why the Author Is the Worst Tester

The person who wrote a prompt has already decided what users will do. Their tests rehearse that assumption. The most reliable way to find real weaknesses is to put the prompt in front of someone who did not write it and ask them to break it.

Make Adversarial Review a Role, Not a Mood

Assign someone the explicit job of attacker for each prompt, even if it is a colleague for an hour. A named role produces real attempts; "everyone should think adversarially" produces none. This pairs well with the structured process in Run Hostile Inputs at Your Prompts, One Step at a Time. The psychology matters here. People are reluctant to break a colleague's work without permission, so the role is also a license. Naming someone the attacker tells them their job today is to make this thing fail, and most people are surprisingly good at it once they are allowed.

Write Boundaries Before You Write Tests

Untested Boundaries Are Just Hopes

You cannot test a boundary you have not stated. Before any attack, write what the prompt must do and must never do, in concrete terms. This definition becomes the standard every output is judged against.

Make Boundaries Specific Enough to Fail

"Be helpful and safe" is untestable. "Never reveal another customer's data; never issue refunds; never give medical advice" is testable. Specificity is what lets a tester declare an output a clear pass or a clear fail instead of arguing about it.

Maintain a Living Attack Inventory

One-Off Testing Decays Immediately

A prompt tested once is safe for exactly that moment. Models change, prompts change, and new attack styles appear. A saved, versioned attack inventory is what makes testing repeatable rather than heroic.

Grow It From Real Traffic

The best new attacks come from watching how actual users phrase things. Feed surprising real inputs back into the inventory so it gets sharper over time. Pair the inventory with a launch gate like our Twelve Checks Before You Ship a Prompt to Real Traffic. Real users are collectively more inventive than any single tester, so their odd inputs are a free and constantly refreshing source of test cases. Treat every surprising production message as a candidate for the inventory rather than a one-off curiosity.

Prioritize by Damage, Not by Cleverness

Not All Failures Are Equal

A data leak and an awkward tone are both failures, but only one ends up in a breach report. Rank failures by what they would actually cost, and fix in that order. This keeps limited time aimed at the failures that matter.

Match Test Intensity to Stakes

A prompt that can move money or expose data deserves far more attacks than one that suggests blog titles. Spreading equal effort across every prompt wastes it on low-stakes ones and starves the dangerous ones. The practical move is to write down, for each prompt, the worst plausible outcome of a failure in a single sentence. That sentence sets the budget. A prompt whose worst case is an awkward email gets an hour; a prompt whose worst case is a regulatory incident gets days. Letting the stated worst case drive effort keeps your attention proportional to actual risk rather than spread evenly out of habit.

Fix Surgically and Re-Test Relentlessly

Isolate Every Change

Change one thing, rerun the full set, then change the next. Bundled fixes hide which edit helped and which one quietly broke a legitimate use case. Isolation keeps cause and effect visible.

Re-Test the Whole Set, Not the One Input

A fix can ripple. Rerunning only the failed input misses the regression it caused elsewhere. The full rerun is the practice that separates reliable prompts from fragile ones, a point we make repeatedly in Where Prompt Hardening Quietly Falls Apart.

Know When the Prompt Is the Wrong Layer

Some Problems Cannot Be Prompted Away

If a class of attacks keeps succeeding no matter how you word the prompt, the fix probably belongs elsewhere: input filtering, a narrower set of allowed actions, or human review for risky requests. Recognizing this saves hours of futile rewording.

Design Defense in Depth

The most resilient systems do not rely on the prompt alone. They combine a hardened prompt with guardrails around it, so a single failure does not become an incident. The trade-offs between layers are explored in Manual Red-Teaming or Automated Fuzzing: Choosing Your Approach.

Make the Safe Path the Easy Path

A practice that depends on heroics will not survive a busy week. The most durable habit a team can build is to make the protective path the path of least resistance: a saved inventory that reruns with one command, a launch checklist that lives in the pull request, a regression suite that runs automatically on every prompt change. When safety is automated and built into the workflow, it happens even under deadline pressure. When it depends on someone remembering to be diligent, it eventually does not happen at all.

Frequently Asked Questions

Which practice matters most if I can only adopt one?

Writing specific boundaries before testing. Everything else depends on it, because without a clear standard you cannot tell a pass from a fail, prioritize damage, or verify a fix. Specific boundaries make the entire rest of the discipline possible.

How do I justify the time these practices take?

Compare it to the cost of a single public failure. A prompt that leaks data, gives dangerous advice, or gets screenshotted misbehaving costs far more than the hours of testing that would have caught it. The practices are cheap insurance against expensive incidents.

Is defense in depth admitting the prompt failed?

No, it is acknowledging that prompts are one layer of a system. Even a well-hardened prompt benefits from input validation and limited permissions around it. Relying on the prompt alone is the fragile choice, not the sophisticated one.

How big should my attack inventory be?

Large enough to cover every attack family and your specific high-stakes cases, small enough that you actually rerun it. Quality and coverage beat raw count. Add an attack only when it tests behavior the existing set does not.

Can these practices be automated?

The mechanical parts can: running a saved inventory, capturing outputs, flagging changes. Judgment-heavy parts, like deciding whether a boundary was crossed in a subtle case, still benefit from human review. Automate the repetition, keep humans for the ambiguity.

Key Takeaways

  • Separate the prompt author from the attacker, since authors test their own assumptions.
  • Write specific, testable boundaries before writing any attacks.
  • Maintain a living, versioned attack inventory and grow it from real traffic.
  • Prioritize fixes by potential damage and match test intensity to stakes.
  • Fix one change at a time, rerun the full set, and move defenses to other layers when the prompt cannot hold.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification