When a multi-turn conversation goes wrong, it rarely announces itself with an error. It just quietly degrades: the assistant asks for something you already gave, acts on a choice you reversed, or loses the thread halfway through a task. These failures almost always trace back to how dialogue state was managed, and the same handful of mistakes account for most of them.
This article names seven of those failure modes. For each one, we explain why it happens, what it costs in a real product, and the corrective practice that prevents it. The point is not to shame anyone; these are the natural mistakes everyone makes the first time they build a stateful conversation. Knowing them in advance is the cheapest way to skip the painful version of learning them.
If you want the constructive counterpart, the positive patterns that avoid all of these, read Holding a Conversation Together Across Many Turns alongside this.
Mistake One: Relying on Raw History as Memory
The most common starting mistake is treating the full transcript as your state.
Why It Happens and What It Costs
Sending the whole conversation every turn is the easiest thing to build, so people start there and never move on. As the conversation grows, the transcript becomes long, slow, and noisy, and the model has to rediscover the relevant facts each turn. Important details get buried and contradictions creep in.
The fix is an explicit state object that holds the durable facts, with the transcript as optional context rather than the source of truth. We cover building one in Wiring Memory Into a Multi-Turn AI Conversation.
Mistake Two: Keeping Old Values After a Correction
When a user changes their mind, leaving the old value in place breaks everything downstream.
The Cost of Stale Data
If the state holds both "Thursday" and "Friday" after a user switched days, the model cannot tell which is current and may act on the wrong one. The conversation contradicts itself or completes with the wrong details.
The corrective practice is simple but easy to forget: on a correction, overwrite the old value so the latest value is the only value. The current state must be a single source of truth, not an accumulating log.
Mistake Three: Injecting Too Much State
The opposite of forgetting is overstuffing.
Drowning the Model in Context
Dumping the entire state object, or the whole history, into every prompt bloats it and dilutes the model's attention. Buried among irrelevant details, the model can miss the one fact the current turn actually needs.
Inject only the state relevant to this turn, summarized. A focused prompt keeps both the cost down and the model's attention where it belongs.
Mistake Four: Trusting Model-Proposed Updates Blindly
Letting the model extract and commit state without validation invites silent corruption.
When a Misread Becomes Permanent
If you ask the model to read a value from a message and write it straight into state, a single misread becomes a fact the rest of the conversation builds on. The error is invisible because nothing flagged it.
The practice is model-proposes, code-validates. Let the model extract from messy input, but have your own logic confirm the value is sensible before it touches stored state.
Mistake Five: No Clear Completion Condition
Conversations that do not know when they are done meander.
Guessing From the Transcript
If completion is inferred loosely from the conversation rather than from the state, the assistant may declare a task finished while a required field is still empty, or keep asking after everything is collected.
Drive completion from the state object: define which fields are required, and finish only when all of them are filled. An explicit check gives the conversation a reliable finish line, a point we develop in Keeping Track of Context Across a Long AI Conversation.
Mistake Six: Ignoring Multi-Fact Messages
Real users do not answer one question at a time.
Losing Information Volunteered Early
If your update logic only looks for the field you just asked about, a user who says "a haircut on Friday at 3, my name's Sam" gets three of those four facts dropped, and the assistant asks for them again later. It feels broken to the user.
Build extraction to capture every field present in a message, not just the one you prompted for. State is field-based, so store whatever arrives.
Mistake Seven: No Plan for Long Conversations
Everything works until the conversation outgrows the prompt window.
Hitting the Wall
Without a strategy, a long conversation eventually exceeds the context limit and either errors out or starts silently dropping the oldest, sometimes most important, information.
The corrective practice is to summarize older turns into compact facts and prune transcript you no longer need, while preserving the structured state object. Because the durable facts live in the object, you can trim history aggressively without losing what the dialogue depends on.
Two Subtler Mistakes Worth Knowing
Beyond the seven core failures, two quieter mistakes tend to surface once the obvious ones are handled.
Injecting Stale Summaries
If you summarize older turns to save space but never refresh the summary as state changes, the model can end up working from an outdated picture. A summary written ten turns ago may describe a choice the user has since reversed. The fix is to derive what you inject from the current state object, not from a frozen summary captured earlier. The summary is a compression of history; the live state is the truth, and the prompt should reflect the live state.
Letting the Model Manage Its Own State Unchecked
It is tempting to hand the entire job to the model: let it remember, let it decide what changed, let it track completion. This works in a demo and drifts in production. Without an explicit object your code controls, you cannot inspect, validate, or correct what the conversation believes. The discipline is to keep the state in your own structure and use the model as an assistant to it, never as the sole keeper of it. This is the same model-proposes, code-decides pattern that prevents mistake four.
How to Spot These Early
The common thread across all of these is that they hide on the happy path and only appear when conversations get messy or long.
Build a Habit of Adversarial Testing
- Always test a correction midway through a flow
- Always test a message that volunteers several facts at once
- Always test a conversation long enough to trigger summarization
A team that scripts these cases into its tests catches the mistakes before users do. The constructive practices that prevent all of them are gathered in Holding a Conversation Together Across Many Turns.
Frequently Asked Questions
Which of these mistakes is the most damaging?
Keeping old values after a correction tends to cause the most visible damage, because it produces confident, wrong completions, a booking on the wrong day, an order with the wrong option. Relying on raw history is the most common, but its damage is gradual rather than acute.
How do I catch these before they reach users?
Test conversations that include corrections, multi-fact messages, and long histories specifically. The happy path almost always works; these failures only appear when you exercise the messy cases. A test suite that includes them surfaces the mistakes early.
Is using raw history always wrong?
No. For short, simple conversations it is fine and avoids extra complexity. It becomes a mistake when conversations grow long enough that the transcript turns slow, costly, and contradictory. Match the approach to the length and stakes of the dialogue.
How do I validate a model-proposed state update?
Check the value against what you know it should be: a date parses to a real date, a choice matches an allowed option, a number falls in a sensible range. Reject or re-confirm anything that fails. The model proposes; your deterministic logic decides what gets committed.
What is the right way to summarize a long conversation?
Compress older turns into a short list of established facts and decisions, keep that summary, and discard the raw turns it replaces. Crucially, preserve the structured state object throughout, since it holds the facts the rest of the conversation needs.
Can these mistakes compound?
Yes, and they often do. Raw-history reliance plus stale values plus no completion check produces a conversation that is long, contradictory, and never finishes cleanly. Fixing the foundational mistake, moving to an explicit validated state object, tends to resolve several at once.
Key Takeaways
- Relying on raw transcript as memory degrades as conversations grow; use an explicit state object
- Overwrite old values on a correction so the current state is a single source of truth
- Inject only the state relevant to the current turn to avoid bloat and lost attention
- Validate model-proposed updates with your own logic before committing them to state
- Drive completion from required fields, capture multi-fact messages, and plan for long conversations