Once you understand that generation works best as a loop rather than a single shot, a harder question takes over. Not all loops are equal. Some converge quickly on something excellent. Others wander, oscillate between two mediocre versions, or quietly degrade with each pass until the fifth attempt is worse than the first. The difference is not luck. It is structure.
This piece assumes you already know the fundamentals: that you draft, critique, revise, and repeat. What it adds is the engineering underneath that practice. We will look at why loops drift, how to design a critique step that actually pulls toward a target, what to do when the model fights your corrections, and how to recognize the edge cases where iteration is the wrong tool entirely.
The goal is reliability. An advanced practitioner does not just iterate; they iterate in a way that predictably arrives somewhere good, and they know in advance roughly how many passes that will take.
Why Loops Drift Instead of Converging
Convergence is not automatic. A loop only converges when each pass reduces the distance between the current output and your target. Many loops fail that condition without the operator noticing.
The moving-target problem
The most common cause of drift is a target that shifts between passes. You ask for "more concise" on pass two, "warmer" on pass three, and "more authoritative" on pass four. Each instruction is reasonable, but together they pull in different directions. The model satisfies the latest request by undoing an earlier one.
A convergent loop holds a fixed specification and measures every draft against it. You can refine the specification, but you do so deliberately and rarely, not as a side effect of reacting to whatever you happen to dislike in the latest draft.
Critique that describes instead of directs
A weak critique says "this is too generic." A strong one says "paragraph two makes a claim with no example; add a concrete one from the manufacturing case." The first is a feeling. The second is an instruction the model can act on without guessing.
When your critique step produces feelings rather than directives, the model fills the gap with its own interpretation, and that interpretation changes every pass. That variance is your drift.
Designing a Critique Step That Pulls
The critique stage is where advanced practitioners earn their results. Most of the leverage in a loop lives here, not in the generation prompt.
Separate the critic from the author
Have the model critique in a distinct turn or even a distinct context from the one that generated the draft. When the same context both writes and judges, it tends to defend its own choices. A fresh critic with only the draft and the specification is far more willing to find real faults.
You can formalize this by giving the critic a rubric: a short list of named criteria, each with a pass or fail and a one-line reason. This forces specificity and gives you a record of what changed pass over pass.
Rank faults before fixing them
Not every flaw deserves a pass. Fixing a minor word choice while a structural problem remains wastes a cycle. Have the critique stage order its findings by impact, and address only the top one or two per revision. This keeps each pass focused and prevents the model from making sweeping edits that introduce new problems while solving small ones.
For a structured approach to organizing these criteria, the Building a Robust Refinement Loop Framework piece offers a reusable scaffold.
Handling the Hard Cases
The fundamentals work on cooperative problems. Advanced work means knowing what to do when the loop misbehaves.
When the model fights a correction
Sometimes you correct the same thing twice and it comes back unchanged. Usually this means your instruction conflicts with something stronger in the prompt or with the model's defaults. The fix is rarely to repeat yourself louder. Instead, find the competing instruction and resolve the conflict explicitly, or move the stubborn requirement into a constraint the model checks against rather than a preference buried in prose.
When passes oscillate
If your output flips between two versions, you are almost certainly issuing contradictory critiques on alternating passes. Freeze one of the two competing properties as a hard constraint, iterate only on the other, then unfreeze. Oscillation is a symptom of an under-specified target, and the cure is to remove a degree of freedom.
When to abandon the loop
Iteration is the wrong tool when the first draft is fundamentally off-strategy rather than imperfect in execution. No amount of revision turns a wrong premise into a right one. Recognizing this early saves cycles. If two passes have not improved the core, the problem is upstream in your specification or your choice of approach, and you should restart rather than continue. The common mistakes discussion covers several other failure signatures worth memorizing.
Controlling Cost and Pass Count
Every pass costs time, tokens, and attention. An advanced operator treats pass count as a budget.
Estimate before you start
Experienced practitioners can predict, roughly, how many passes a task needs. A short factual rewrite converges in one or two. A nuanced strategy document with competing constraints might take four or five. Setting that expectation up front prevents both premature stopping and endless polishing.
Front-load the expensive decisions
The most costly thing to change late is structure. If you let the model commit to an outline early and only refine within it, you avoid expensive restructuring passes. Lock the skeleton first, then iterate on the flesh. This single discipline often halves total pass count.
Stopping Criteria That Actually Stop
The hardest skill in iteration is knowing when to stop. Without a rule, you either quit too early or polish forever.
Define done before you begin
Write your acceptance criteria before the first draft exists. When every rubric item passes, you are done, full stop. This removes the temptation to keep tweaking because something could theoretically be marginally better. For metrics on what "good enough" looks like in practice, the metrics piece is a useful companion.
Watch for diminishing returns
When a pass changes wording but not substance, the loop has converged and further iteration is noise. Track whether each pass resolves a rubric item. The moment a pass resolves nothing, stop.
Frequently Asked Questions
How many refinement passes is too many?
There is no fixed number, but if a pass resolves no item on your acceptance rubric, you have gone one too far. For most tasks, convergence happens within two to four passes. If you routinely need more than five, your initial specification is probably too vague, and tightening it will cut your pass count.
Should I use the same model for generation and critique?
You can, but separate the contexts. The same model judging its own fresh output in a clean context is far more critical than continuing the conversation that produced the draft. Some practitioners use a stronger model for critique and a faster one for generation, which works well when critique quality is the bottleneck.
How do I stop a loop from oscillating between two versions?
Oscillation means you are issuing contradictory critiques on alternating passes. Freeze one of the two competing properties as a hard constraint, iterate on the other until it is right, then revisit the frozen one. The oscillation disappears once the target stops moving.
Is iterative refinement worth it for simple tasks?
Often not. A one-line factual answer or a simple reformatting rarely benefits from a loop. Reserve iteration for outputs where quality is multidimensional and the cost of being wrong is real. For routine work, a single well-specified prompt is faster and good enough.
What is the single most common advanced mistake?
Letting the target drift. Practitioners react to each draft with a new preference, and the cumulative effect is a loop that never converges because it never has a stable goal. Fix the specification, measure against it, and resist redefining "good" mid-loop.
Can I automate a refinement loop end to end?
Partly. You can automate the generate-critique-revise cycle with a fixed rubric and a stopping rule, and this works well for high-volume, well-bounded tasks. But automated loops inherit any weakness in your rubric, so the human work moves upstream into defining the criteria precisely.
Key Takeaways
- Convergence requires a fixed target; drift almost always traces back to a goal that shifts between passes.
- The critique step is where the leverage lives. Make it specific, directive, and ranked by impact rather than vague and reactive.
- Separate the critic from the author so the model judges its draft honestly instead of defending it.
- Treat pass count as a budget, lock structure early, and front-load the expensive decisions.
- Define your stopping criteria before the first draft, and stop the moment a pass resolves nothing on the rubric.
- Iteration is the wrong tool when the premise is wrong; restart rather than revise a fundamentally off-strategy draft.