Turning Injection Defense Into a Process You Can Hand Off

The difference between a team that defends against prompt injection well and one that does it occasionally is rarely knowledge. It is process. The first team has a documented sequence of steps that runs every time an AI feature is built or changed. The second team relies on whoever happens to remember, which means it sometimes happens and sometimes does not.

A repeatable workflow takes the techniques out of individual heads and puts them into a process that a new engineer, a contractor, or a future version of yourself can follow without supervision. It is the difference between defense as a personal skill and defense as a property of how your team works. This article walks through building that workflow stage by stage.

For the strategic context behind these steps, The Complete Guide to Prompt Injection Defense gives the foundations. Here, the goal is a process you can hand off.

Why a Workflow Beats Good Intentions

Talented engineers still skip steps under deadline pressure, especially steps that are not part of the normal flow. A workflow solves three problems at once: consistency, so the same checks run every time; transferability, so the practice does not depend on one person; and auditability, so you can show what was done and when.

The handoff test

A good workflow passes a simple test: could someone who has never seen this system pick up the document and apply it correctly? If the answer requires "ask the person who built it," the workflow is incomplete. Everything below is designed to pass that test.

There is a fourth benefit that teams discover only after they build the workflow: speed. Engineers often resist process because they assume it slows them down. In practice, a clear workflow removes the cognitive load of deciding what to do each time, which makes the work faster, not slower. The slow part of defense is figuring out what is needed from a blank page. A workflow turns that open question into a checklist, and checklists are fast.

Stage One: Classify the Feature

Every workflow run starts by understanding what you are dealing with.

Identify untrusted inputs

List every source of content the feature reads that it did not author: user messages, retrieved documents, tool outputs, file contents, external pages. If a source is untrusted, it can carry an injection. Naming them explicitly prevents the common mistake of treating retrieved content as trusted.

Determine the blast radius

Decide what a successful injection could cause. A feature that only displays text to a human is low stakes. A feature that can take actions, send, modify, pay, delete, is high stakes and gets more rigorous treatment in the stages that follow. This mirrors the prioritization logic in The Prompt Injection Defense Playbook.

Stage Two: Apply the Controls

With classification done, apply controls proportional to the stakes.

The standard control set

Structure the prompt so trusted instructions and untrusted data are clearly separated
Restrict the feature's tools and permissions to the minimum it needs
Gate any irreversible or sensitive action behind a human confirmation or hard check
Validate the model's output against expected shape before acting on it

For low blast-radius features, structure and validation may be enough. For high blast-radius features, every control applies, plus the screening added in the next stage.

The workflow should make this proportionality explicit rather than leaving it to judgment. Spell out which controls are mandatory at each blast-radius tier so an engineer following the document does not have to decide how much is enough. Left to individual judgment, the answer drifts toward whatever fits the schedule. Encoded in the workflow, the answer stays consistent across people and across deadlines, which is the entire point of having a workflow.

Record what you applied and why

Note which controls were applied and, importantly, which were deliberately skipped and why. The reasoning is what lets the next person extend or correct the work. Undocumented omissions look identical to oversights.

This record does double duty during incident reviews. If something goes wrong, the first question is always whether the control that would have stopped it was considered and rejected, or simply forgotten. A documented decision answers that instantly and turns a blame conversation into a learning one. The cost is a sentence or two per feature. The payoff is a clear trail every time someone asks why the system behaves the way it does.

Stage Three: Test Adversarially

A control you have not tried to break is a guess.

Run the attack library

Maintain a shared set of injection techniques and run them against the feature in a production-like environment. Include direct attempts in user input and indirect attempts hidden in retrieved content. Add any new technique you encounter to the shared library so every future run benefits. Ready-made scenarios for this live in Prompt Injection Defense: Real-World Examples and Use Cases.

Record results

Document what was tested, what passed, and what failed. A clean test you cannot prove is the same as no test when an incident review asks what coverage existed.

Stage Four: Review and Sign Off

Bolt the final check onto the review you already run.

Make it a review-gate item

A pull request that touches a prompt or adds an untrusted source should require a reviewer to confirm the workflow ran: feature classified, controls applied, adversarial test passed, decisions recorded. This costs a reviewer a few minutes and prevents the silent gaps that accumulate otherwise.

Stage Five: Maintain the Workflow

The workflow itself is a living artifact.

Trigger re-runs on change

Re-run the workflow whenever the feature changes: a new tool, a new data source, a model upgrade. Defenses decay as the system around them shifts, so the workflow must fire on change, not just at first build.

Keep the documentation current

When you learn a new attack or retire an obsolete control, update the workflow document so the next run reflects current reality. A stale workflow gives false confidence, which is its own risk, as covered in The Hidden Risks of Prompt Injection Defense.

Frequently Asked Questions

How detailed should the workflow document be?

Detailed enough that a competent newcomer could follow it without asking the original author questions. That usually means concrete steps, named control options, and recorded decisions rather than high-level principles. If it needs a verbal explanation to use, it is not finished.

Does every feature need the full workflow?

The classification stage applies to every feature, but the depth of controls and testing scales with blast radius. A read-only summarizer gets a lighter pass than an agent that can take irreversible actions. The workflow tells you how much rigor each feature earns.

How do we keep the workflow from being skipped under deadline pressure?

Attach it to an existing gate, usually code review, so it is not optional extra work but part of shipping. When the check lives inside the normal flow, it survives busy weeks far better than a separate process people must remember.

What goes in the adversarial attack library?

A growing collection of injection techniques: direct overrides in user input, indirect instructions hidden in retrieved content, encoded or multilingual attempts, and any novel attack your team encounters. Sharing it across the team means every feature benefits from every lesson.

Who maintains the workflow over time?

A designated defense lead keeps the document and the attack library current, while individual engineers run the workflow on their own features. Maintenance is light but must have a named owner, or the document silently goes stale.

Key Takeaways

Process, not knowledge, separates consistent defense from occasional defense.
A good workflow passes the handoff test: a newcomer can apply it without asking the author.
Start every run by classifying untrusted inputs and blast radius.
Apply controls proportional to stakes, and record what you skipped and why.
Test adversarially with a shared attack library and bolt sign-off onto existing reviews.
Re-run the workflow on every change and keep the document current as you learn.

For the strategic context behind these steps, The Complete Guide to Prompt Injection Defense gives the foundations. Here, the goal is a process you can hand off.

Why a Workflow Beats Good Intentions

The handoff test

Stage One: Classify the Feature

Every workflow run starts by understanding what you are dealing with.

Identify untrusted inputs

Determine the blast radius

Stage Two: Apply the Controls

With classification done, apply controls proportional to the stakes.

The standard control set

Structure the prompt so trusted instructions and untrusted data are clearly separated
Restrict the feature's tools and permissions to the minimum it needs
Gate any irreversible or sensitive action behind a human confirmation or hard check
Validate the model's output against expected shape before acting on it

For low blast-radius features, structure and validation may be enough. For high blast-radius features, every control applies, plus the screening added in the next stage.

Record what you applied and why

Stage Three: Test Adversarially

A control you have not tried to break is a guess.

Run the attack library

Record results

Document what was tested, what passed, and what failed. A clean test you cannot prove is the same as no test when an incident review asks what coverage existed.

Stage Four: Review and Sign Off

Bolt the final check onto the review you already run.

Make it a review-gate item

Stage Five: Maintain the Workflow

The workflow itself is a living artifact.

Trigger re-runs on change

Keep the documentation current

Frequently Asked Questions

How detailed should the workflow document be?

Does every feature need the full workflow?

How do we keep the workflow from being skipped under deadline pressure?

What goes in the adversarial attack library?

Who maintains the workflow over time?

Key Takeaways

Process, not knowledge, separates consistent defense from occasional defense.
A good workflow passes the handoff test: a newcomer can apply it without asking the author.
Start every run by classifying untrusted inputs and blast radius.
Apply controls proportional to stakes, and record what you skipped and why.
Test adversarially with a shared attack library and bolt sign-off onto existing reviews.
Re-run the workflow on every change and keep the document current as you learn.

Turning Injection Defense Into a Process You Can Hand Off

Why a Workflow Beats Good Intentions

The handoff test

Stage One: Classify the Feature

Identify untrusted inputs

Determine the blast radius

Stage Two: Apply the Controls

The standard control set

Record what you applied and why

Stage Three: Test Adversarially

Run the attack library

Record results

Stage Four: Review and Sign Off

Make it a review-gate item

Stage Five: Maintain the Workflow

Trigger re-runs on change

Keep the documentation current

Frequently Asked Questions

How detailed should the workflow document be?

Does every feature need the full workflow?

How do we keep the workflow from being skipped under deadline pressure?

What goes in the adversarial attack library?

Who maintains the workflow over time?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Turning Injection Defense Into a Process You Can Hand Off

Why a Workflow Beats Good Intentions

The handoff test

Stage One: Classify the Feature

Identify untrusted inputs

Determine the blast radius

Stage Two: Apply the Controls

The standard control set

Record what you applied and why

Stage Three: Test Adversarially

Run the attack library

Record results

Stage Four: Review and Sign Off

Make it a review-gate item

Stage Five: Maintain the Workflow

Trigger re-runs on change

Keep the documentation current

Frequently Asked Questions

How detailed should the workflow document be?

Does every feature need the full workflow?

How do we keep the workflow from being skipped under deadline pressure?

What goes in the adversarial attack library?

Who maintains the workflow over time?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?