AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

The Situation: A Bot That Could Not Keep Its PlaceThe symptomsThe costThe Decision: Stop Inferring State, Start Tracking ItWhat they chose to trackThe Execution: Three Iterations Over Three WeeksIteration one — the state blockIteration two — constraints anchored to stateIteration three — trimming the transcriptThe Outcome: What Actually MovedMeasured resultsThe Lessons That GeneralizedInference is not a featureConstraints are where state pays offLess transcript, more structureWhat the Team Would Do DifferentlyInstrument before changing anythingStart with the constraint, not the state blockTreat the transcript as a liability soonerHow This Maps to Other DomainsThe portable sequenceFrequently Asked QuestionsWhy not just give the model the full transcript every time?How long did the redesign take?What was the single highest-impact change?Did the state block increase token costs?How did they prevent state from drifting out of sync with reality?Could a smaller team replicate this?Key Takeaways
Home/Blog/Rebuilding a Lapsing-Renewal Bot Around Explicit Turn State
General

Rebuilding a Lapsing-Renewal Bot Around Explicit Turn State

A

Agency Script Editorial

Editorial Team

·February 11, 2021·8 min read
dialogue state management in promptsdialogue state management in prompts case studydialogue state management in prompts guideprompt engineering

A mid-sized software company ran a renewals assistant that handled inbound chats from customers whose subscriptions were lapsing. The bot's job was to identify the account, surface the renewal offer, answer objections, and either close the renewal or route to a human. On paper it worked. In practice, the team kept getting escalations from frustrated customers who said the bot "didn't listen."

This is a narrative case study of how that team diagnosed the problem, decided on a redesign, executed it over a few weeks, and measured the result. The arc is deliberately specific because the lessons live in the details — the moment they realized the transcript alone was not enough, and the exact change that turned things around.

The names and numbers here are illustrative of a common pattern rather than a single named client, but the sequence of decisions reflects how these projects actually unfold. What makes this account worth reading is not that the team succeeded — most teams eventually do — but the order in which they discovered things, because that order is where the transferable lessons live.

The Situation: A Bot That Could Not Keep Its Place

The original assistant was built the way most first versions are. Each turn, the prompt received the full chat transcript plus a system instruction describing the bot's goal. The model was expected to infer everything — which account, what had been offered, what the customer had already declined — from reading the history.

The symptoms

  • Customers were re-asked for their account email they had provided minutes earlier.
  • The bot re-pitched a discount the customer had explicitly rejected.
  • After a customer agreed to renew, the bot sometimes continued objection-handling as if nothing had been decided.

The cost

Escalation rates were climbing, and the renewals team estimated that roughly one in four bot conversations ended with a customer more annoyed than when they started. For a retention product, that is precisely the wrong direction.

The Decision: Stop Inferring State, Start Tracking It

The team's lead engineer made the call that defined the project: the model would no longer be responsible for reconstructing state from the transcript. Instead, the application would maintain an explicit state object and inject it into every prompt.

What they chose to track

  • account_identified: boolean, plus the resolved account record
  • offers_presented: list of offers shown
  • offers_declined: list of offers the customer rejected with reasons
  • renewal_status: open, agreed, or declined
  • escalation_requested: boolean

This decision aligns with the principle laid out in A Reusable Model for Tracking Dialogue State in Prompts: the model consumes state, it does not own it.

The Execution: Three Iterations Over Three Weeks

Iteration one — the state block

The first change was purely additive. They prepended a labeled state block to the existing prompt without removing the transcript. Immediately, the re-asking-for-email problem dropped sharply, because account_identified told the model the account was already known.

Iteration two — constraints anchored to state

Next they added negative constraints: "Never present an offer that appears in offers_declined." This killed the re-pitching behavior. They drew the specific constraint patterns from Concrete Scenarios That Reveal Whether Your Dialogue State Holds.

Iteration three — trimming the transcript

With reliable structured state, they discovered they could trim the raw transcript to the last few turns. The state block carried the durable facts; the recent transcript carried only conversational tone. This cut token cost meaningfully and, counterintuitively, improved accuracy because the model had less noise to wade through.

The Outcome: What Actually Moved

The team instrumented the rollout as an A/B test, sending half of eligible conversations to the new design.

Measured results

  • Re-asking for already-provided information fell to near zero in the new variant.
  • Escalations driven by "the bot didn't listen" complaints dropped by more than half.
  • Token cost per conversation decreased after the transcript trimming, despite the added state block.
  • Successful self-serve renewals rose, because conversations stayed coherent long enough to close.

The metrics they chose to watch came straight from Reading the Signal: Metrics for Dialogue State in Prompts, which gave them a vocabulary for what "better" meant.

The Lessons That Generalized

Inference is not a feature

The most important lesson was philosophical. Asking a model to re-derive state every turn is not clever prompting; it is a liability. State the team already knows should be handed to the model, not rediscovered by it.

Constraints are where state pays off

The state object was useful for telling the model what was true, but it was transformative when used to tell the model what was off-limits. The biggest behavior fixes all came from negative constraints anchored to state.

Less transcript, more structure

Trimming history once structured state existed was the surprise win. It lowered cost and raised quality at the same time, a rare combination. Teams evaluating the broader payoff will find that reasoning developed further in Putting Numbers Behind Dialogue State Management in Prompts.

What the Team Would Do Differently

Hindsight produced a short list of things the team wished they had done from the start, and these are arguably more valuable than the wins because they save the next team the detour.

Instrument before changing anything

The team built the new design and then scrambled to add measurement so they could prove it worked. Reversing that order would have been better. Had they captured re-ask rate and escalation reasons before touching the prompt, the before-and-after story would have been cleaner and the internal sell easier.

Start with the constraint, not the state block

Their first iteration added the state block, which helped, but the largest single improvement came from the negative constraint in iteration two. In retrospect, they could have led with the constraint against re-presenting declined offers, because that was the behavior generating the angriest escalations. Fixing the most painful symptom first builds organizational confidence faster.

Treat the transcript as a liability sooner

The team kept the full transcript far longer than necessary out of caution, fearing that trimming it would lose context. When they finally trimmed it, both cost and accuracy improved. The lesson: once structured state reliably carries the durable facts, a long transcript is more noise than safety net.

How This Maps to Other Domains

The renewals assistant is specific, but the arc is not. A checkout flow, a technical support bot, and an onboarding assistant all hit the same wall — the model cannot reliably reconstruct state from history — and all resolve it the same way.

The portable sequence

  • Diagnose the re-asking and contradicting as state failures, not model failures.
  • Inject an explicit, labeled state object built from authoritative data.
  • Constrain behavior with negative rules anchored to specific state fields.
  • Trim the now-redundant transcript to recover cost and accuracy.
  • Measure the change against re-ask and escalation rates to prove it.

Any team facing "the bot doesn't listen" complaints can run this same sequence. The domains differ; the playbook does not.

Frequently Asked Questions

Why not just give the model the full transcript every time?

It works until conversations get long, then the model starts missing or misweighting facts buried in the history. Structured state surfaces the facts that matter, which is both cheaper and more reliable.

How long did the redesign take?

In this account, roughly three weeks across three iterations. The first iteration delivered most of the value; the later ones refined cost and edge cases.

What was the single highest-impact change?

Adding negative constraints anchored to state — specifically, never re-presenting a declined offer. That fixed the most damaging behavior the bot exhibited.

Did the state block increase token costs?

Initially yes, but trimming the now-redundant transcript more than offset it, producing a net reduction in tokens per conversation.

How did they prevent state from drifting out of sync with reality?

The application updated state from authoritative events — payment systems, CRM records — rather than from the model's outputs. The model never wrote canonical state.

Could a smaller team replicate this?

Yes. The core change is conceptual, not infrastructural: stop asking the model to infer state, and start injecting it. A small team can implement a state block in a day.

Key Takeaways

  • The root problem was asking the model to reconstruct state from the transcript every turn.
  • Injecting an explicit, labeled state object eliminated re-asking and re-pitching behavior.
  • Negative constraints anchored to state produced the largest behavioral improvements.
  • Trimming the transcript after adding structured state lowered cost and raised accuracy together.
  • Canonical state must come from authoritative systems, never from the model's own output.
  • Measuring with the right metrics let the team prove the redesign worked rather than assume it.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification