AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Play 1: The Initial BuildPlay 2: The Behavior Change RequestPlay 3: The Incident ResponseThe sequencePlay 4: The Periodic AuditPlay 5: The Model MigrationSequencing the PlaysHow they connectRoles and OwnershipFrequently Asked QuestionsHow often should I actually run the audit play?Who should own the prompt if we do not have a dedicated AI team?What goes in code versus the prompt during incident response?Can I run these plays without a formal test set?Does every behavior change really need the full sequence?Key Takeaways
Home/Blog/Stop Forgetting Your System Prompt Exists
General

Stop Forgetting Your System Prompt Exists

A

Agency Script Editorial

Editorial Team

·October 21, 2024·8 min read
what is a system promptwhat is a system prompt playbookwhat is a system prompt guideai fundamentals

Most teams write a system prompt once, paste it into a config file, and forget it exists until a user does something embarrassing. A playbook fixes that. It treats the system prompt as an operational asset with named plays, clear triggers, and an owner for each, so the prompt evolves on purpose instead of by panic.

This is not a beginner explainer. If you need the basics, read What Is a System Prompt: A Beginner's Guide first. This piece is for the person who already ships an assistant and needs a repeatable way to run, change, and defend its instruction layer.

The structure below is organized by play. Each play has a trigger that tells you when to run it, an owner who is accountable, and a sequence of steps. Borrow the ones you need and ignore the rest.

Play 1: The Initial Build

Trigger: You are standing up a new assistant or replacing a placeholder prompt.

Owner: The product owner for the feature, with engineering support.

Start with role and scope, not rules. Write one sentence that names who the assistant is and what it is for. Everything else hangs off that. Then add the four blocks that nearly every strong prompt needs:

  • Role and scope: who the assistant is and what it will not do.
  • Behavior rules: tone, format, refusal conditions, escalation paths.
  • Domain context: the facts and policies specific to your product.
  • Output format: structure, length, and any required fields.

Resist the urge to anticipate every edge case on day one. You will discover the real ones in production. Ship a tight first version and let the later plays handle the rest.

Play 2: The Behavior Change Request

Trigger: Someone, often sales or support, asks the assistant to do something new or stop doing something.

Owner: A single prompt maintainer, never a committee editing live.

This is where prompts rot. A request comes in, someone edits the prompt in a hurry, and three weeks later nobody knows why a rule exists. Run the change through a sequence instead:

  1. Write the request as a behavior statement: "When a user asks X, the assistant should do Y."
  2. Find the existing rule it conflicts with, if any, and decide which wins.
  3. Make the smallest edit that produces the behavior.
  4. Run the test set before merging.

The conflict check in step two is the one people skip, and it is the one that causes regressions. The common mistakes article catalogs what happens when you do not.

Play 3: The Incident Response

Trigger: The assistant said or did something it should not have, in production, in front of a user.

Owner: On-call engineer, escalating to the prompt maintainer.

Speed matters here, but so does not overcorrecting. A single bad output often triggers a heavy-handed rule that breaks ten good behaviors.

The sequence

  • Capture the exact input and output. Do not paraphrase.
  • Reproduce it in a test harness before touching the prompt.
  • Decide whether this is a prompt problem or a code problem. Anything safety-critical should move to code.
  • Add the failing case to your permanent test set so it can never silently return.

The last step is what turns an incident into a durable fix instead of a recurring fire.

Play 4: The Periodic Audit

Trigger: A calendar interval, monthly or quarterly, plus any major model upgrade.

Owner: The prompt maintainer, with a reviewer from outside the team.

Prompts accumulate cruft. Old rules outlive the problems they solved, examples reference deprecated features, and length creeps up call by call. An audit is scheduled cleanup.

Walk the prompt top to bottom and ask of every line: is this still true, is it still needed, and is it stated once. Cut what fails. Then re-run the full test set, because cleanup can change behavior in ways that feel safe but are not.

Play 5: The Model Migration

Trigger: You are moving to a new model or provider.

Owner: Engineering, with the prompt maintainer reviewing outputs.

A system prompt is not portable. Different models weight instructions differently, handle formatting differently, and refuse differently. Assume the prompt will misbehave on the new model until proven otherwise.

Run your entire test set on the new model before you switch a single user. Pay special attention to refusals and formatting, the two areas where models diverge most. Expect to rewrite parts of the prompt, not just paste it across. Treat this as a real migration, not a config flip.

Sequencing the Plays

Plays are not independent. They feed each other, and the order matters.

How they connect

  • The Initial Build creates the prompt and the first test set.
  • Behavior Change and Incident Response both grow the test set as they run.
  • The Periodic Audit prunes what those two added.
  • Model Migration stress-tests everything the others produced.

The connective tissue across all of them is the test set. It is the institutional memory of your prompt. Every play either reads from it or writes to it. If you take one thing from this playbook, build and guard that test set. For the discipline behind it, see Best Practices That Actually Work.

Roles and Ownership

A playbook without owners is a wish list. Assign these clearly.

  • Prompt maintainer: one person who owns the canonical prompt and approves changes. Not a group.
  • Product owner: decides what the assistant should do, sets the behavior priorities.
  • On-call engineer: handles incidents and reproduces failures.
  • Outside reviewer: a fresh set of eyes for audits, to catch the rules the maintainer has gone blind to.

Diffuse ownership is the single biggest reason system prompts decay. When everyone can edit and no one is accountable, the prompt becomes a junk drawer. One maintainer, with a real review path, keeps it coherent.

Frequently Asked Questions

How often should I actually run the audit play?

Monthly for high-traffic assistants, quarterly for stable internal tools, and always after a model upgrade. The trigger is not just the calendar. If you notice the prompt has grown by a third since the last audit, or if behavior changes are getting risky, run it early. The cost of a stale prompt is silent, so do not wait for an incident to force it.

Who should own the prompt if we do not have a dedicated AI team?

Pick the person closest to the product who can also read code, usually a senior engineer or a technical product manager. The role does not require an AI specialist. It requires someone accountable who understands both the product behavior and the test set. The worst outcome is shared ownership across a whole team with no single approver.

What goes in code versus the prompt during incident response?

Anything that must never happen goes in code. Refunds above a threshold, data deletion, disclosure of regulated information: enforce these in application logic, not prompt text. The prompt handles tone, routine routing, and soft guidance. During an incident, the first question is always whether a prompt rule was the right fix or a band-aid over a missing code guardrail.

Can I run these plays without a formal test set?

You can, but you will reintroduce old bugs constantly. The test set is what makes the plays repeatable instead of heroic. Even a flat file of twenty input-output pairs that you run by hand beats nothing. Start small, add a case every time something breaks, and the set will become your most valuable prompt asset within a month.

Does every behavior change really need the full sequence?

For anything user-facing, yes. The conflict check and the test run are where regressions get caught, and a "tiny" edit is exactly the kind that quietly breaks three other behaviors. For an experimental internal tool, you can move faster. The rule of thumb: the more users see it, the more discipline the change deserves.

Key Takeaways

  • Treat the system prompt as an operational asset with named plays, triggers, and a single owner.
  • The Initial Build ships tight; later plays handle the edge cases production reveals.
  • Behavior changes need a conflict check and a test run, every time, to prevent silent regressions.
  • Incident response should add the failing case to a permanent test set, not just patch and move on.
  • Audits prune accumulated cruft; model migrations require full re-testing, never a config flip.
  • The test set is the connective tissue across all plays and the memory of your prompt.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification