AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

The operating principles behind every playOne source of truth for the live promptEvery change is a hypothesisOwnership is explicitPlay 1: The new-failure triagePlay 2: The precedence auditPlay 3: The model migrationWhat to re-test specificallyPlay 4: The feature-request intakePlay 5: The pre-launch adversarial passPlay 6: The drift checkCommon drift sourcesSequencing the playsFrequently Asked QuestionsHow often should we run the precedence audit?Who should own the system prompt overall?Should the playbook live in the same repo as the prompt?What if a play conflicts with shipping speed?How do we onboard someone to the playbook?Key Takeaways
Home/Blog/Plays, Triggers, and Owners for Running System Prompts in Production
General

Plays, Triggers, and Owners for Running System Prompts in Production

A

Agency Script Editorial

Editorial Team

·July 9, 2024·7 min read
system promptssystem prompts playbooksystem prompts guideprompt engineering

A guide tells you what a system prompt is. A playbook tells you what to do on Tuesday when the assistant starts giving longer answers than it did last week and nobody changed the prompt. The difference matters because system prompts in production are not a one-time authoring task. They are a living asset that drifts, breaks, and gets quietly edited by three different people until no one knows what the current behavior is supposed to be.

This is an operating model, not a tutorial. It assumes you already have a working assistant and a system prompt that mostly does its job. What you lack is a repeatable set of moves for the recurring situations: a new failure report, a model upgrade, a feature request that wants to bolt another rule onto an already-crowded prompt. Each play below has a trigger that tells you when to run it, a sequence of steps, and a clear owner so decisions do not stall.

Treat the plays as named procedures your team can call by name. "Run the precedence audit" should mean the same thing to everyone. That shared vocabulary is half the value.

The operating principles behind every play

Before the plays themselves, three principles govern how they run. Skip these and the plays become bureaucracy.

One source of truth for the live prompt

The current production system prompt lives in exactly one place, under version control, with a changelog. Not in someone's notes, not pasted into three dashboards. If you cannot answer "what is the exact prompt in production right now" in ten seconds, fix that before anything else.

Every change is a hypothesis

A prompt edit is a claim that behavior will improve. Claims get tested, not merged on intuition. This keeps well-meaning tweaks from silently degrading behavior nobody was watching.

Ownership is explicit

Each play names an owner who makes the final call. Shared ownership means no ownership. The owner is not necessarily the person who writes the change; they are the person accountable for whether it ships.

Play 1: The new-failure triage

Trigger: a user, stakeholder, or monitoring alert reports the assistant behaving wrongly.

Owner: the prompt maintainer on rotation.

Start by reproducing the failure with the exact input. Half of reported failures cannot be reproduced, which usually means the trigger was a per-session factor, not the prompt. If you can reproduce it, classify the cause: is it a missing rule, a contradiction between existing rules, a phrasing weakness, or a model limitation no prompt can fix? The classification determines the fix.

Only after classification do you touch the prompt. The instinct to immediately add a rule is what bloats prompts into incoherence. Frequently the right fix is editing an existing rule, not adding a new one. Our list of 7 Common Mistakes with System Prompts (and How to Avoid Them) maps most triage outcomes to a known pattern.

Play 2: The precedence audit

Trigger: the prompt has grown by several rules since the last audit, or two instructions appear to be fighting.

Owner: whoever last shipped a prompt change.

Read the entire prompt as if you were the model, top to bottom, and flag every pair of instructions that cannot both be satisfied. List them. For each pair, decide which rule wins and delete or rewrite the loser. Do not leave both in place hoping the model sorts it out; it will not, and it will pick differently across runs.

This audit should run on a schedule, not only when something breaks, because contradictions accumulate invisibly. The compression techniques in System Prompts: Best Practices That Actually Work pair naturally with this play.

Play 3: The model migration

Trigger: you are moving to a new model version or provider.

Owner: the engineering lead for the assistant.

A prompt tuned for one model is a draft for the next. Do not assume parity. Run your full evaluation set against the new model with the existing prompt unchanged, record every behavior difference, then adjust. Models differ in instruction-following strictness, refusal tendencies, and how they weight ordering. Budget real time for this; migrations that assume drop-in compatibility are how teams ship regressions.

What to re-test specifically

  • Hard constraints, since a stricter model may now over-refuse
  • Output format adherence, which often shifts subtly
  • Tone and verbosity, which rarely transfer cleanly
  • Edge cases from your adversarial set

Resist the temptation to fix every difference at once. Record them all first, then triage: some differences are improvements you want to keep, some are neutral, and only a subset are regressions worth a prompt change. Migrating teams that start editing before they finish observing tend to chase symptoms and miss the pattern.

Play 4: The feature-request intake

Trigger: someone wants the assistant to do a new thing.

Owner: the product owner for the assistant.

Before adding anything to the prompt, ask whether the new behavior belongs in the system prompt at all. Per-request behavior belongs in the user message or context. Only genuinely durable, every-conversation behavior earns a place in the system prompt. This single gate prevents most prompt bloat. The reusable structure in A Framework for System Prompts gives intake a consistent slot for each kind of rule.

Play 5: The pre-launch adversarial pass

Trigger: any prompt change about to reach production.

Owner: the prompt maintainer.

Run the change against your adversarial test set: override attempts, malformed input, out-of-scope questions, and contradictory requests. A change that passes the happy path but fails adversarial inputs does not ship. This play is the gate that keeps testing-only success from becoming a production incident.

Play 6: The drift check

Trigger: behavior changes without a corresponding prompt edit, or a scheduled interval passes.

Owner: the prompt maintainer on rotation.

Sometimes the prompt is untouched but behavior still shifts, usually because an upstream model update changed how instructions are interpreted, or because a per-session input pattern changed. The drift check confirms the live behavior still matches the documented intent. Run your regression set, compare against the expected outcomes, and investigate any divergence. Drift caught early is a quick fix; drift discovered through a user complaint is an incident with an audience.

Common drift sources

  • Silent model updates from your provider
  • Shifts in the kind of input users actually send
  • Accumulated small edits that crossed a coherence threshold
  • Context or memory layers feeding different information than before

Sequencing the plays

The plays are not independent. A new feature (Play 4) should be followed by a precedence audit (Play 2) and a pre-launch pass (Play 5) before it ships. A model migration (Play 3) triggers a full adversarial pass on everything, not just changed rules. Build these dependencies into your process so the right plays fire automatically rather than relying on someone remembering. A documented Repeatable Workflow for System Prompts is where this sequencing gets encoded.

Frequently Asked Questions

How often should we run the precedence audit?

On a fixed cadence tied to change volume, not the calendar. A high-traffic assistant getting weekly edits warrants an audit every few changes. A stable one can go longer. The trigger is accumulated change, not elapsed time.

Who should own the system prompt overall?

A single accountable person, even on a team. Multiple owners produce inconsistent edits and orphaned rules. That person reviews every change, even ones they did not write, so coherence stays intact.

Should the playbook live in the same repo as the prompt?

Yes. Keep the prompt, its changelog, the evaluation set, and the playbook together so anyone touching the prompt sees the procedures alongside it. Separation is how teams forget the process exists.

What if a play conflicts with shipping speed?

The adversarial pass is the one play you never skip for speed, because skipping it trades minutes saved for production incidents. Other plays can be lightweight under deadline, but the pre-launch gate stays.

How do we onboard someone to the playbook?

Have them shadow each play once with the current owner, then run the next instance themselves with review. The named-play vocabulary makes this fast; they learn five procedures, not an undocumented art.

Key Takeaways

  • Run system prompts as a living asset with named plays, not as a one-time authoring task.
  • Keep one version-controlled source of truth for the live prompt and a changelog beside it.
  • Triage failures by cause before editing, since the right fix is often a rewrite, not a new rule.
  • Treat model migrations as full re-tuning efforts, never drop-in replacements.
  • Gate every change behind an adversarial pass, the one play that is never skipped for speed.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification