Most checklists are forgotten the moment you read them because they list items without explaining why the items matter. A checklist you cannot reason about is a checklist you will skip the first time you are in a hurry. This one pairs each item with a short justification, so when you are tempted to skip a step you know exactly what you are risking.
Use it two ways. When building a new staged reasoning prompt, work down the list in order. When reviewing an existing one, scan for the items it violates. Either way, treat it as a working tool rather than a wall poster, something you actually run a prompt against before shipping it.
The items are grouped by phase: before you write, while you write, and after you write. The order is deliberate, because skipping an early item usually causes a later one to fail.
Before You Write
The work you do before drafting determines whether the draft can succeed at all.
Define what a good answer looks like
Write one or two sentences describing a correct, complete answer. Justification: you cannot produce or test an answer you cannot describe, and this description doubles as your acceptance test later.
List the natural steps
Write out the stages a careful human would take to solve the problem. Justification: the steps you take by hand are the steps the prompt needs, and finding them now prevents a vague "reason carefully" later.
Gather every input
Collect all facts, numbers, and constraints the task requires. Justification: a model fills missing inputs with plausible guesses, producing confident wrong answers, as the examples article shows repeatedly.
While You Write
Drafting is where structure becomes wording.
Name the steps explicitly
Turn your step list into numbered instructions. Justification: named steps cut run-to-run variance, while generic nudges let the model invent a different structure each time.
Order steps by dependency
Sequence so every step's inputs exist before it runs. Justification: a step that needs an earlier result will guess or contradict itself if placed too early.
Mark hard versus soft constraints
State which requirements are non-negotiable and which are preferences. Justification: without the distinction, the model may trade away something essential to optimize something optional.
Separate the final answer
Put the conclusion under a labeled heading. Justification: this lets software extract it and humans skim to it, and it lets you keep or discard the reasoning cleanly, a habit from the best practices guide.
After You Write
A draft is a hypothesis until you test it.
Run a known-answer test set
Test on three to thirty cases whose correct answers you already know. Justification: this is the only way to tell whether the prompt works, versus merely looks thorough.
Locate the breaking step on failures
When an answer is wrong, read the reasoning and find the exact step that failed. Justification: this tells you precisely what to fix instead of forcing a blind rewrite.
Trim steps that do not change outcomes
Remove each step and rerun; keep only those that affect results. Justification: dead steps add cost, latency, and failure points, the trap described in the common mistakes article.
Document why each step exists
Leave a short note beside each instruction. Justification: future editors who do not know which steps are load-bearing will eventually remove one and break the prompt.
Using the Checklist Over Time
A checklist is most valuable when it becomes routine.
Review on every meaningful edit
Whenever you change a prompt, rerun the test set and re-scan the list. Justification: edits made under pressure are exactly when these items get skipped and regressions slip in.
Revisit inputs when the task shifts
If the underlying task changes, return to the "gather every input" item first. Justification: most prompt failures after a task change trace back to an input that is now missing or outdated. For the deeper reasoning model behind these items, see the framework article.
Adapting the Checklist to Your Stakes
A checklist is not a single fixed ritual. The same list should feel different depending on how much rides on the prompt, and learning to flex it is what keeps it from becoming bureaucracy.
The lightweight pass
For a throwaway prompt you will run once, the honest minimum is three items: define a good answer, gather every input, and glance at the output to see whether it matches your target. The rest of the list addresses problems that only appear when a prompt is reused, so applying it to a one-off is wasted motion. Knowing which items to drop is as important as knowing the items.
The production pass
For a prompt that runs many times a day, every item earns its place, and two deserve extra weight. The known-answer test set becomes non-negotiable, because at volume an undetected regression compounds quietly across thousands of runs. And the documentation item becomes a safeguard, because production prompts are edited by people who did not write them, often under pressure. A note beside each step is the difference between a safe edit and a silent break.
The handoff pass
When you give a prompt to a teammate, add one item to the list: walk them through why each step exists before they touch it. The checklist captures the what; a five-minute conversation captures the judgment behind it. Prompts that change hands without that conversation tend to lose their load-bearing steps within a few edits.
Turning the Checklist Into a Habit
The best checklist is the one you stop having to consciously consult because its items have become reflexes.
Run it the same way every time
Consistency is what converts a list into a habit. If you always start a new prompt by writing the "good answer" sentence and always finish by trimming dead steps, those bookends will eventually feel wrong to skip. The structure trains you, not just the prompt.
Let failures refine the list
When a prompt fails in a way the checklist did not catch, add an item. Your list should grow to reflect the specific failure modes of your own work, not stay frozen as a generic template. Over time it becomes a record of every mistake you have learned not to repeat, which is the most valuable form a checklist can take.
Frequently Asked Questions
Do I have to do every item for every prompt?
For quick, low-stakes prompts you can skip the testing and trimming items. For anything that runs repeatedly or matters, work the full list. The "define a good answer" and "gather every input" items are worth doing always, because skipping them causes the most failures.
What order should I work the checklist in?
Top to bottom, by phase. The before-you-write items make the writing items possible, and the writing items make the after-you-write items meaningful. Skipping ahead usually means circling back when a later item fails.
How big should the test set be?
Three to five cases for a small prompt, up to a few dozen for high-stakes or high-volume work. The goal is enough coverage of tricky cases that a passing prompt genuinely earns your trust.
Why is documenting each step on the list?
Because prompts get edited later, often by someone who does not remember why each instruction exists. A note beside each step prevents an editor from removing a load-bearing instruction and silently breaking the prompt.
How often should I rerun the checklist?
On every meaningful edit, and whenever the underlying task changes. Most regressions appear right after a hurried edit, which is exactly when running the list catches them before they ship.
Key Takeaways
- Each checklist item carries a justification so you know what you risk when you skip it.
- Before writing, define a good answer, list the natural steps, and gather every input.
- While writing, name and order the steps, mark hard versus soft constraints, and separate the final answer.
- After writing, run a known-answer test set, locate the breaking step on failures, trim dead steps, and document why each remains.
- The before-you-write items are worth doing for every prompt, because skipping them causes the most failures.
- Rerun the list on every meaningful edit and whenever the task changes, since that is when regressions slip in.