Plays, Triggers, and Owners for Running System Prompts in Production

A guide tells you what a system prompt is. A playbook tells you what to do on Tuesday when the assistant starts giving longer answers than it did last week and nobody changed the prompt. The difference matters because system prompts in production are not a one-time authoring task. They are a living asset that drifts, breaks, and gets quietly edited by three different people until no one knows what the current behavior is supposed to be.

This is an operating model, not a tutorial. It assumes you already have a working assistant and a system prompt that mostly does its job. What you lack is a repeatable set of moves for the recurring situations: a new failure report, a model upgrade, a feature request that wants to bolt another rule onto an already-crowded prompt. Each play below has a trigger that tells you when to run it, a sequence of steps, and a clear owner so decisions do not stall.

Treat the plays as named procedures your team can call by name. "Run the precedence audit" should mean the same thing to everyone. That shared vocabulary is half the value.

The operating principles behind every play

Before the plays themselves, three principles govern how they run. Skip these and the plays become bureaucracy.

One source of truth for the live prompt

The current production system prompt lives in exactly one place, under version control, with a changelog. Not in someone's notes, not pasted into three dashboards. If you cannot answer "what is the exact prompt in production right now" in ten seconds, fix that before anything else.

Every change is a hypothesis

A prompt edit is a claim that behavior will improve. Claims get tested, not merged on intuition. This keeps well-meaning tweaks from silently degrading behavior nobody was watching.

Ownership is explicit

Each play names an owner who makes the final call. Shared ownership means no ownership. The owner is not necessarily the person who writes the change; they are the person accountable for whether it ships.

Play 1: The new-failure triage

Trigger: a user, stakeholder, or monitoring alert reports the assistant behaving wrongly.

Owner: the prompt maintainer on rotation.

Start by reproducing the failure with the exact input. Half of reported failures cannot be reproduced, which usually means the trigger was a per-session factor, not the prompt. If you can reproduce it, classify the cause: is it a missing rule, a contradiction between existing rules, a phrasing weakness, or a model limitation no prompt can fix? The classification determines the fix.

Only after classification do you touch the prompt. The instinct to immediately add a rule is what bloats prompts into incoherence. Frequently the right fix is editing an existing rule, not adding a new one. Our list of 7 Common Mistakes with System Prompts (and How to Avoid Them) maps most triage outcomes to a known pattern.

Play 2: The precedence audit

Trigger: the prompt has grown by several rules since the last audit, or two instructions appear to be fighting.

Owner: whoever last shipped a prompt change.

Read the entire prompt as if you were the model, top to bottom, and flag every pair of instructions that cannot both be satisfied. List them. For each pair, decide which rule wins and delete or rewrite the loser. Do not leave both in place hoping the model sorts it out; it will not, and it will pick differently across runs.

This audit should run on a schedule, not only when something breaks, because contradictions accumulate invisibly. The compression techniques in System Prompts: Best Practices That Actually Work pair naturally with this play.

Play 3: The model migration

Trigger: you are moving to a new model version or provider.

Owner: the engineering lead for the assistant.

A prompt tuned for one model is a draft for the next. Do not assume parity. Run your full evaluation set against the new model with the existing prompt unchanged, record every behavior difference, then adjust. Models differ in instruction-following strictness, refusal tendencies, and how they weight ordering. Budget real time for this; migrations that assume drop-in compatibility are how teams ship regressions.

What to re-test specifically

Hard constraints, since a stricter model may now over-refuse
Output format adherence, which often shifts subtly
Tone and verbosity, which rarely transfer cleanly
Edge cases from your adversarial set

Resist the temptation to fix every difference at once. Record them all first, then triage: some differences are improvements you want to keep, some are neutral, and only a subset are regressions worth a prompt change. Migrating teams that start editing before they finish observing tend to chase symptoms and miss the pattern.

Play 4: The feature-request intake

Trigger: someone wants the assistant to do a new thing.

Owner: the product owner for the assistant.

Before adding anything to the prompt, ask whether the new behavior belongs in the system prompt at all. Per-request behavior belongs in the user message or context. Only genuinely durable, every-conversation behavior earns a place in the system prompt. This single gate prevents most prompt bloat. The reusable structure in A Framework for System Prompts gives intake a consistent slot for each kind of rule.

Play 5: The pre-launch adversarial pass

Trigger: any prompt change about to reach production.

Owner: the prompt maintainer.

Run the change against your adversarial test set: override attempts, malformed input, out-of-scope questions, and contradictory requests. A change that passes the happy path but fails adversarial inputs does not ship. This play is the gate that keeps testing-only success from becoming a production incident.

Play 6: The drift check

Trigger: behavior changes without a corresponding prompt edit, or a scheduled interval passes.

Owner: the prompt maintainer on rotation.

Sometimes the prompt is untouched but behavior still shifts, usually because an upstream model update changed how instructions are interpreted, or because a per-session input pattern changed. The drift check confirms the live behavior still matches the documented intent. Run your regression set, compare against the expected outcomes, and investigate any divergence. Drift caught early is a quick fix; drift discovered through a user complaint is an incident with an audience.

Common drift sources

Silent model updates from your provider
Shifts in the kind of input users actually send
Accumulated small edits that crossed a coherence threshold
Context or memory layers feeding different information than before

Sequencing the plays

The plays are not independent. A new feature (Play 4) should be followed by a precedence audit (Play 2) and a pre-launch pass (Play 5) before it ships. A model migration (Play 3) triggers a full adversarial pass on everything, not just changed rules. Build these dependencies into your process so the right plays fire automatically rather than relying on someone remembering. A documented Repeatable Workflow for System Prompts is where this sequencing gets encoded.

Frequently Asked Questions

How often should we run the precedence audit?

On a fixed cadence tied to change volume, not the calendar. A high-traffic assistant getting weekly edits warrants an audit every few changes. A stable one can go longer. The trigger is accumulated change, not elapsed time.

Who should own the system prompt overall?

A single accountable person, even on a team. Multiple owners produce inconsistent edits and orphaned rules. That person reviews every change, even ones they did not write, so coherence stays intact.

Should the playbook live in the same repo as the prompt?

Yes. Keep the prompt, its changelog, the evaluation set, and the playbook together so anyone touching the prompt sees the procedures alongside it. Separation is how teams forget the process exists.

What if a play conflicts with shipping speed?

The adversarial pass is the one play you never skip for speed, because skipping it trades minutes saved for production incidents. Other plays can be lightweight under deadline, but the pre-launch gate stays.

How do we onboard someone to the playbook?

Have them shadow each play once with the current owner, then run the next instance themselves with review. The named-play vocabulary makes this fast; they learn five procedures, not an undocumented art.

Key Takeaways

Run system prompts as a living asset with named plays, not as a one-time authoring task.
Keep one version-controlled source of truth for the live prompt and a changelog beside it.
Triage failures by cause before editing, since the right fix is often a rewrite, not a new rule.
Treat model migrations as full re-tuning efforts, never drop-in replacements.
Gate every change behind an adversarial pass, the one play that is never skipped for speed.

Treat the plays as named procedures your team can call by name. "Run the precedence audit" should mean the same thing to everyone. That shared vocabulary is half the value.

The operating principles behind every play

Before the plays themselves, three principles govern how they run. Skip these and the plays become bureaucracy.

One source of truth for the live prompt

Every change is a hypothesis

A prompt edit is a claim that behavior will improve. Claims get tested, not merged on intuition. This keeps well-meaning tweaks from silently degrading behavior nobody was watching.

Ownership is explicit

Play 1: The new-failure triage

Trigger: a user, stakeholder, or monitoring alert reports the assistant behaving wrongly.

Owner: the prompt maintainer on rotation.

Play 2: The precedence audit

Trigger: the prompt has grown by several rules since the last audit, or two instructions appear to be fighting.

Owner: whoever last shipped a prompt change.

Play 3: The model migration

Trigger: you are moving to a new model version or provider.

Owner: the engineering lead for the assistant.

What to re-test specifically

Hard constraints, since a stricter model may now over-refuse
Output format adherence, which often shifts subtly
Tone and verbosity, which rarely transfer cleanly
Edge cases from your adversarial set

Play 4: The feature-request intake

Trigger: someone wants the assistant to do a new thing.

Owner: the product owner for the assistant.

Play 5: The pre-launch adversarial pass

Trigger: any prompt change about to reach production.

Owner: the prompt maintainer.

Play 6: The drift check

Trigger: behavior changes without a corresponding prompt edit, or a scheduled interval passes.

Owner: the prompt maintainer on rotation.

Common drift sources

Silent model updates from your provider
Shifts in the kind of input users actually send
Accumulated small edits that crossed a coherence threshold
Context or memory layers feeding different information than before

Sequencing the plays

Frequently Asked Questions

How often should we run the precedence audit?

Who should own the system prompt overall?

A single accountable person, even on a team. Multiple owners produce inconsistent edits and orphaned rules. That person reviews every change, even ones they did not write, so coherence stays intact.

Should the playbook live in the same repo as the prompt?

Yes. Keep the prompt, its changelog, the evaluation set, and the playbook together so anyone touching the prompt sees the procedures alongside it. Separation is how teams forget the process exists.

What if a play conflicts with shipping speed?

How do we onboard someone to the playbook?

Key Takeaways

Run system prompts as a living asset with named plays, not as a one-time authoring task.
Keep one version-controlled source of truth for the live prompt and a changelog beside it.
Triage failures by cause before editing, since the right fix is often a rewrite, not a new rule.
Treat model migrations as full re-tuning efforts, never drop-in replacements.
Gate every change behind an adversarial pass, the one play that is never skipped for speed.

Plays, Triggers, and Owners for Running System Prompts in Production

The operating principles behind every play

One source of truth for the live prompt

Every change is a hypothesis

Ownership is explicit

Play 1: The new-failure triage

Play 2: The precedence audit

Play 3: The model migration

What to re-test specifically

Play 4: The feature-request intake

Play 5: The pre-launch adversarial pass

Play 6: The drift check

Common drift sources

Sequencing the plays

Frequently Asked Questions

How often should we run the precedence audit?

Who should own the system prompt overall?

Should the playbook live in the same repo as the prompt?

What if a play conflicts with shipping speed?

How do we onboard someone to the playbook?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Plays, Triggers, and Owners for Running System Prompts in Production

The operating principles behind every play

One source of truth for the live prompt

Every change is a hypothesis

Ownership is explicit

Play 1: The new-failure triage

Play 2: The precedence audit

Play 3: The model migration

What to re-test specifically

Play 4: The feature-request intake

Play 5: The pre-launch adversarial pass

Play 6: The drift check

Common drift sources

Sequencing the plays

Frequently Asked Questions

How often should we run the precedence audit?

Who should own the system prompt overall?

Should the playbook live in the same repo as the prompt?

What if a play conflicts with shipping speed?

How do we onboard someone to the playbook?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?