Teams that build multi-turn assistants tend to reinvent the same scaffolding every time: somewhere to store what the user said, some logic to feed it back into the prompt, and a pile of ad-hoc instructions to stop the model from contradicting itself. The result is hard to reason about and harder to debug, because there is no shared vocabulary for the moving parts.
This article proposes a named, reusable model for that scaffolding: Capture, Render, Constrain, Reconcile. It is not a library or a product. It is a way of thinking about dialogue state that separates four concerns that teams routinely tangle together. Once separated, each becomes simpler to build, test, and reason about.
The model applies whether you are hand-rolling state in a few lines of code or adopting a framework. It is most useful when conversations run long enough that the model can no longer reliably infer state from the raw transcript — which, in practice, is sooner than most teams expect.
Why bother naming the stages at all? Because the act of naming forces a separation of concerns that ad-hoc code blurs. When capture and constrain live in the same tangled prompt, a bug in one looks like a bug in the other. When they are named, distinct stages, you can point at the failure and know which stage to fix. Vocabulary is not decoration here; it is the thing that makes the system debuggable.
Stage One: Capture
Capture is the act of turning raw conversational input into structured state.
What capture does
When a user says "I'm a team of twelve and we mostly write blog posts," capture extracts team_size: 12 and use_case: "content" rather than storing the sentence. The captured representation is structured, because everything downstream depends on fields, not prose.
When to apply it
Apply capture for any fact the assistant will need to recall, constrain, or act on later. Do not capture conversational filler. The discipline of deciding what to capture is itself valuable — over-capturing bloats state, under-capturing reintroduces inference. The concrete patterns in When a Prompt Forgets the User Already Paid: State Examples show capture working across several domains.
Stage Two: Render
Render is the act of turning structured state back into the prompt the model sees.
What render does
Render selects the relevant subset of state and formats it into a labeled block injected into the prompt. Critically, render decides what to include — a long conversation does not mean a long state block. It means a precise one.
When to apply it
Render runs every turn. The key decision is selection: include the fields this turn needs and omit the rest. Rendering the entire state object every turn is the most common cause of bloated prompts and degraded performance.
Components of good rendering
- A labeled header so the model can locate state instantly
- Explicit nulls so the model distinguishes "not collected" from "absent"
- Backend values injected verbatim, never paraphrased
Stage Three: Constrain
Constrain is the act of using state to bound the model's behavior.
What constrain does
This is where state earns most of its value. Constraints tell the model what not to do, anchored to specific fields: do not re-ask for a filled slot, do not re-present a declined offer, do not reopen a finalized decision.
When to apply it
Apply constraints wherever a state value implies a forbidden action. The strongest assistants are defined as much by their constraints as by their capabilities. The checklist in Concrete Scenarios That Reveal Whether Your Dialogue State Holds enumerates the constraints worth encoding.
Stage Four: Reconcile
Reconcile is the act of keeping state synchronized with authoritative reality.
What reconcile does
When the payment system confirms a charge, reconcile updates payment_status. When the user revises an earlier answer, reconcile overwrites the slot and flags dependents. Reconcile ensures the captured state never drifts from the system of record.
When to apply it
Reconcile runs whenever an authoritative event occurs — a backend confirmation, a user revision, a tool result. The governing rule: the model never owns canonical state. It reads a rendered copy; the system reconciles the original.
Putting the Stages Together
In a single turn, the four stages run in sequence: reconcile state from any new events, render the relevant subset into the prompt, let the model respond, then capture new facts from that exchange for the next turn.
Why the separation matters
- Testability. Each stage can be tested in isolation — capture extraction, render selection, constraint adherence, reconciliation correctness.
- Debuggability. When something breaks, you know which stage to inspect rather than staring at one monolithic prompt.
- Portability. The same four-stage shape works across tools and frameworks, which makes the tooling comparison a question of which stages each tool handles for you.
Applying the Model to Three Common Failures
The framework earns its keep when something breaks, because it tells you exactly where to look. Consider three failures and how the stages localize each.
Failure: the assistant re-asks for known information
This is almost always a render failure. The fact was captured correctly and lives in state, but render either omitted it from the prompt or buried it where the model overlooked it. The fix is in render: include the field in a labeled block and have the instruction reference it by name. You do not need to touch capture, constrain, or reconcile.
Failure: the assistant repeats an action it already took
This is a constrain failure, usually compounded by a reconcile gap. The action result was not reconciled into state as "done," so render had nothing to show, and constrain had nothing to forbid. The fix touches two stages: reconcile the action result into an attempted list, then add a constraint that forbids repeating anything on it.
Failure: state disagrees with the system of record
This is purely a reconcile failure. Capture, render, and constrain are all working on stale data. The fix is to wire reconcile to the authoritative event — a backend confirmation, a tool result — so canonical state stops drifting. The lesson recurs throughout the examples: the model must never be the source of canonical truth.
When the Framework Is Overkill
Honesty about scope keeps the model from becoming dogma. Not every assistant needs all four stages formalized.
Where to scale down
- A two-turn helper needs at most light capture and render. Formalizing constrain and reconcile is wasted ceremony.
- A stateless summarizer needs none of it, because there is no conversation to track.
- A short, low-stakes FAQ bot can lean on a recent transcript and skip structured capture entirely.
The framework scales up cleanly to complex agents and scales down cleanly to simple bots. Its value is in giving you a shared way to reason about which stages a given assistant actually requires, which is itself a design decision worth making deliberately rather than by accident.
Frequently Asked Questions
Is this framework tied to a specific model or vendor?
No. Capture-Render-Constrain-Reconcile is a way of organizing concerns. It applies to any model that takes a text prompt and any storage layer you choose.
Which stage do teams most often skip?
Constrain. Teams build capture and render, then wonder why the model still re-asks and contradicts. Constraints anchored to state are the missing piece more often than not.
Do I need all four stages for a simple bot?
A trivial assistant might only need light capture and render. Constrain and reconcile become essential as stakes and conversation length rise.
How is this different from slot filling?
Slot filling is essentially capture plus render for a fixed set of slots. The framework generalizes it by adding constrain and reconcile, which is where most reliability problems actually live.
Where does the model fit in this model?
The model is the consumer between render and capture. It reads rendered state and produces output. It never owns the canonical state — reconcile does.
Can I adopt the stages incrementally?
Yes. Most teams start with render, add constrain when contradictions appear, then formalize reconcile as authoritative events multiply.
Key Takeaways
- The Capture-Render-Constrain-Reconcile model separates four concerns teams usually tangle together.
- Capture turns raw input into structured fields; render injects the relevant subset each turn.
- Constrain uses state to forbid actions and is the stage most teams skip, to their cost.
- Reconcile keeps state synchronized with authoritative systems; the model never owns canonical state.
- Separating the stages makes the system testable, debuggable, and portable across tools.
- Adopt the stages incrementally, starting with render and adding constraints as problems surface.