AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Step 1: State the Problem in Plain, Unambiguous TermsPin Down Every Quantity and UnitSay What You Want the Answer to BeStep 2: Require Step-by-Step ReasoningThe Instruction to AddWhy Order Matters HereStep 3: Split Compound Calculations Into StagesSeparate the StagesCheck Between StagesStep 4: Offload Exact ArithmeticUse Code or Tools When AvailableWhen You Have No ToolsStep 5: Verify the ResultSanity and BoundsRecompute the Hard OnesStep 6: Capture the Working PromptSave and ReuseNote the Stakes TierA Worked Run Through the ProcessThe TaskThe Process in MotionFrequently Asked QuestionsDo I have to run all six steps every time?What if I do not have access to code execution or tools?Why frame the problem before asking for reasoning?How much verification is enough?Can I combine the steps into a single prompt?Key Takeaways
Home/Blog/Wiring Up Math a Model Can Actually Get Right
General

Wiring Up Math a Model Can Actually Get Right

A

Agency Script Editorial

Editorial Team

·April 26, 2020·8 min read
prompting for numerical reasoning tasksprompting for numerical reasoning tasks how toprompting for numerical reasoning tasks guideprompt engineering

Knowing that language models struggle with numbers is useful background. What you actually need is a process — a specific sequence of moves you can run every time a task involves calculation, so you stop relying on luck and start getting dependable answers. This piece is that process, laid out as ordered steps you can follow today.

The workflow assumes nothing fancy. It works whether you are using a basic chat interface or a system with code execution and tools. Where a more capable setup lets you skip or strengthen a step, the text says so. The point is to give you a default routine that turns numerical prompting from a coin flip into something predictable.

Each step builds on the last. Frame the problem clearly, force visible reasoning, offload the exact arithmetic where you can, and verify before you trust. Run them in order and the error rate drops sharply. Skip steps when stakes are low; run all of them when a wrong number would cost you.

Step 1: State the Problem in Plain, Unambiguous Terms

Most numerical errors start before any calculation, in a poorly stated problem.

Pin Down Every Quantity and Unit

Write out exactly what each number means, including its unit. "Revenue grew 15 percent" is ambiguous — 15 percent of what, over what period, before or after the figure you also mentioned. Spell it out so the model has nothing to guess at. Ambiguity is where wrong answers are born.

Say What You Want the Answer to Be

State the form of the result you expect: a dollar amount, a percentage rounded to one decimal, a count. A model that knows the target format is less likely to wander off into a different calculation than the one you meant.

Step 2: Require Step-by-Step Reasoning

With a clean problem statement, force the model to reason out loud rather than jump to an answer.

The Instruction to Add

Append a clear directive: "Work through this step by step. Show each calculation and state the result of each step before giving the final answer." This is the highest-value move in the whole workflow.

Why Order Matters Here

Doing this after Step 1 matters because step-by-step reasoning on a vague problem just produces confident, well-structured nonsense. Clarity first, then visible reasoning. The conceptual background for why this works is in Getting Language Models to Do Math They Can Actually Trust.

Step 3: Split Compound Calculations Into Stages

If the task has multiple distinct operations, do not run them as one prompt.

Separate the Stages

Identify each distinct operation — compute a subtotal, apply a rate, adjust for a fee — and handle them as separate prompts or clearly separated sections. Carry the verified result of one stage into the next rather than letting the model thread everything internally.

Check Between Stages

Glance at each intermediate result before moving on. Catching a wrong subtotal early stops it from poisoning every calculation that follows. This is the practical version of the structure described in The FRAME Method for Numerical Reasoning Prompts.

Step 4: Offload Exact Arithmetic

The model should set up the math; something deterministic should perform it.

Use Code or Tools When Available

If your environment can run code, instruct the model to write and execute a short calculation rather than computing in its head. A line of code returns the exact value with no approximation. This single step removes most arithmetic errors.

When You Have No Tools

Without code execution, have the model produce the formula and the inputs clearly, then run the final arithmetic in a spreadsheet or calculator yourself. The model's reasoning is the valuable part; the exact computation is the part you should not trust it to do.

Step 5: Verify the Result

Before you use the number, confirm it.

Sanity and Bounds

Ask whether the answer is plausible and roughly the expected size. If you expected something near 400 and got 4,000, you have found an error worth chasing. Check that the result obeys obvious constraints — no negative counts, no percentages over 100 unless that makes sense.

Recompute the Hard Ones

For numbers that matter, compute the value a second way and compare. Two independent methods that agree give real confidence. Two that disagree have just saved you from acting on a mistake. The failure modes to watch for are catalogued in 7 Mistakes That Wreck Numerical Reasoning Prompts.

Step 6: Capture the Working Prompt

Once a sequence reliably produces good answers for a kind of task, do not reinvent it next time.

Save and Reuse

Keep the prompt structure that worked — the framing language, the step-by-step instruction, the verification ask — as a reusable template. Numerical tasks tend to recur in similar shapes, and a saved pattern means you run a proven process instead of starting fresh.

Note the Stakes Tier

Record which version is the full, high-stakes routine and which is the lightweight one, so you can match effort to consequence quickly. Real applications of these saved patterns appear in Where Numerical Reasoning Prompts Earn Their Keep.

A Worked Run Through the Process

Seeing the steps applied to one task makes the sequence concrete in a way the abstract description cannot.

The Task

Suppose a client asks: a service costs 480 dollars per month, you are offering a 12 percent annual-prepay discount, and there is a one-time 60-dollar setup fee. What does the first year cost if they prepay?

The Process in Motion

Running the steps in order keeps every operation small and checkable:

  • Frame it. Annual base is 480 times 12, the discount applies to that base, the setup fee is added once after the discount, and the answer should be a dollar amount.
  • Reason in steps. Have the model compute the annual base (5,760), then the discount (691.20), then the discounted base (5,068.80), then add setup (5,128.80).
  • Split the stages. Each of those is its own checkable result rather than one tangled calculation.
  • Offload the arithmetic. If tools are available, the exact figures come from code rather than the model's estimation.
  • Verify. The total should be a bit under the undiscounted 5,820, which it is, and a recomputation confirms 5,128.80.

The lesson is that no single step is hard once the task is decomposed this way. Each move is small enough to trust, which is the whole point of the process. The same decomposition logic underpins Task Decomposition Is Quietly Retiring the Mega-Prompt.

Frequently Asked Questions

Do I have to run all six steps every time?

No. The full sequence is for tasks where a wrong number carries real cost. For a quick, low-stakes estimate, clean framing plus step-by-step reasoning is often enough. Match the depth of the process to how much an error would actually matter; over-applying it wastes effort.

What if I do not have access to code execution or tools?

You can still run the whole workflow except the automated arithmetic. Let the model handle framing, reasoning, and setting up the calculation, then perform the final exact arithmetic yourself in a spreadsheet or calculator. The reasoning steps are where the model adds value; the computation is what you should verify externally.

Why frame the problem before asking for reasoning?

Because step-by-step reasoning on an ambiguous problem produces a tidy, confident answer to the wrong question. Clarity has to come first. A precisely stated problem gives the model's reasoning something solid to work from, and it removes a major source of silent error before any calculation begins.

How much verification is enough?

Enough that you would be comfortable being wrong in public if you skipped more. For internal estimates, a sanity check suffices. For numbers going to clients, in contracts, or into decisions with money attached, recompute them independently. The cost of verifying is almost always far less than the cost of acting on a bad figure.

Can I combine the steps into a single prompt?

For simple tasks, yes — you can ask for a clearly framed problem, step-by-step reasoning, and a sanity check in one message. For compound calculations, separating the stages into distinct prompts gives you cleaner intermediate results and easier debugging. The more steps the math has, the more value there is in keeping them apart.

Key Takeaways

  • Numerical errors often begin with an ambiguous problem, so state every quantity, unit, and the expected answer format first.
  • Forcing step-by-step reasoning is the highest-value move once the problem is clearly framed.
  • Splitting compound calculations into checkable stages stops early errors from corrupting later ones.
  • Offload exact arithmetic to code, tools, or a calculator, leaving the model to set up the problem rather than compute it.
  • Verify with sanity checks and independent recomputation, then save the working prompt as a reusable template tiered by stakes.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification