If your assistant forgets things across turns, you do not need a research project to fix it. You need a small, well-placed change: stop asking the model to reconstruct state from the transcript, and start handing it the state directly. This article is the fastest credible path from that realization to a first working result, without the detours that make the topic seem harder than it is.
We will cover what you need before you start, the minimal implementation that produces a real improvement, how to verify it worked, and where to go once the basics are solid. The goal is a first result you can ship, not a complete platform. Depth comes later; the win you can get this week is what builds momentum.
By the end, you will have a state block injected into your prompts and an assistant that stops re-asking for information it already has. That single behavior change is often enough to justify the whole effort.
One mindset shift makes everything that follows click: you are not teaching the model to remember better. You are taking the remembering away from the model entirely and giving it to your code. The model's job becomes responding well to facts you hand it, not reconstructing those facts from a transcript. Once you internalize that division of labor, the implementation is almost obvious.
Prerequisites
You need surprisingly little to begin.
What to have in place
- A multi-turn assistant. Single-shot prompts have no state to manage; this only matters once conversations span turns.
- A place to store state. Any per-conversation storage works — a session object, a database row, even an in-memory map for a prototype.
- The ability to edit the prompt. You must be able to control the text sent to the model each turn. If your tooling hides this, address that first, as Tooling That Tracks Conversation State Across Prompt Turns discusses.
What you do not need
You do not need a framework, a vector database, or any new infrastructure to get a first result. A state object and a string template are enough.
The Minimal Build
Here is the smallest implementation that produces a real improvement.
Step one: define a state object
Pick the handful of facts your assistant keeps re-asking for and define them as named fields:
state = {
"name": null,
"account_email": null,
"issue": null
}Step two: render state into the prompt
Before each turn, inject a labeled block:
CURRENT STATE:
- name: Dana
- account_email: dana@example.com
- issue: nullThis single step is the render stage from A Reusable Model for Tracking Dialogue State in Prompts, and it alone eliminates most re-asking.
Step three: add one constraint
Add an instruction that uses the state:
Only ask for fields whose value is null.
Never ask for a field that already has a value.Step four: update state after each turn
When the user provides a field, write it into the state object so the next turn's render reflects it.
Verifying It Worked
Do not assume the change helped — confirm it.
A quick verification
- Run a test conversation that previously triggered re-asking. The assistant should now skip the questions it already has answers to.
- Inspect the injected state block. Log it and confirm it contains the facts you expect. Most early bugs are visible here.
- Measure re-ask rate before and after. Even an informal count, drawn from Reading the Signal: Metrics for Dialogue State in Prompts, proves the improvement.
Common First-Timer Mistakes
Dumping the whole transcript and the state block
If you keep the full transcript and add a state block, you may not see much improvement because the model still gets conflicting signals. Trim older transcript turns once state carries the durable facts.
Letting the model write canonical state
The model should read state, not own it. Update state from authoritative inputs — user messages, backend confirmations — not from the model's claims about what it did.
Capturing too much
Track only the fields that matter. A bloated state object is harder to maintain and clutters the prompt. Start with the three or four facts that cause the most re-asking.
Where to Go Next
Once the minimal build works, deepen it in the order that matches your needs.
A sensible progression
- Add more constraints as you spot contradictions, following the checklist.
- Handle revisions and out-of-order input so users can change earlier answers cleanly.
- Decide on tooling only once you understand what you are buying, and build the business case before scaling the effort across more assistants.
A Worked First Build, End to End
To make the path concrete, here is how a first build actually unfolds for a simple appointment-booking assistant, from the empty state to a working improvement.
The starting point
The assistant repeatedly asks "What service do you need?" and "What day works?" even after the user has answered. Each restart of the loop frustrates users. The root cause is that the prompt only ever received the raw transcript, and the model kept losing track in longer conversations.
The change, step by step
- Define the slots:
service,preferred_day,preferred_time,confirmed. All startnull. - Render them in a labeled
BOOKING STATE:block at the top of every prompt, showing current values and explicit nulls. - Add the constraint: "Ask only for null fields. Once confirmed is true, do not reopen earlier questions."
- Update on each turn: when the user names a service, write it to the slot before building the next prompt.
The result
The assistant now asks for each piece of information exactly once, fills slots as the user provides them in any order, and stops cleanly once the booking is confirmed. A user can even provide the day and time in a single message and watch both slots fill at once. This is the same slot-filling pattern shown across domains in When a Prompt Forgets the User Already Paid: State Examples, reduced to its simplest form.
Knowing When You Have Outgrown the Minimal Build
The minimal build is a starting point, not a destination. A few signals tell you it is time to invest further, and recognizing them early prevents the assistant from regressing as it grows.
Signs you need more
- Contradictions appear even though re-asking is fixed — you need more constraints and possibly the sticky-decision guardrails from the checklist.
- Conversations span sessions and users expect continuity — you need durable, per-user storage rather than in-memory session state.
- The assistant starts taking actions like booking or charging — you need to track action results, not just collected facts.
- You are scaling to many assistants — it is time to weigh tooling and build a proper business case before repeating the work by hand everywhere.
Outgrowing the minimal build is a sign of success, not failure. It means the assistant is doing enough real work that better state management has become worth the investment.
Frequently Asked Questions
How long does a first result take?
For a focused assistant, often an afternoon. Defining a small state object, rendering it, and adding one constraint is a few hours of work that produces a visible improvement.
Do I need a framework to start?
No. A state object and a string template are enough for a first result. Frameworks become relevant later, for long or complex conversations.
What is the very first thing to fix?
Re-asking for already-provided information. It is the most common, most visible state failure and the easiest to eliminate with a rendered state block.
How do I know which facts to track?
Track the fields your assistant keeps re-asking for. Those are, by definition, the ones the model is failing to remember from the transcript.
Should I keep the transcript too?
Keep only a short recent window for tone once structured state carries the durable facts. Keeping the full transcript alongside state often muddies the signal.
What if my conversations are very short?
Then you may not need this at all. Dialogue state management pays off as conversations lengthen and the cost of forgetting rises.
Key Takeaways
- The fastest fix is to stop inferring state from the transcript and start injecting it directly.
- You need only a multi-turn assistant, per-conversation storage, and control of the prompt to begin.
- The minimal build is a state object, a rendered state block, one constraint, and an update step.
- Verify by re-running a previously failing conversation and inspecting the injected state block.
- Avoid the common mistakes: keeping the full transcript, letting the model own state, and capturing too much.
- Deepen incrementally with more constraints, revision handling, and a deliberate tooling decision.