Most cost overruns on AI projects come from one thing: nobody did the arithmetic before building. The model worked in a demo, everyone got excited, it shipped, and the first full month's invoice was a shock. This article fixes that by giving you a concrete, sequential process you can run today, on any workload, in under an hour.
We are not going to philosophize about pricing. We are going to walk through eight numbered steps that take you from a vague idea to a defensible monthly budget. Follow them in order. Each step produces a number that feeds the next, and the final number tells you whether your design is affordable before you write a single line of production code.
Have a calculator and the pricing page of your chosen provider open. Let's go.
Step 1: Define One Representative Request
Pick the single most common request your application will make. Not the biggest, not the smallest — the typical one. If you are building a support chatbot, that is "a user asks one question and gets one answer." If you are building a document classifier, it is "one document in, one label out."
Everything that follows is computed for this one request and then scaled. Getting the representative request right is the foundation, so be honest about what "typical" looks like in real usage, not in your tidy demo.
Step 2: Count the Input Tokens
Add up everything you send to the model for that one request:
- The system prompt or instructions.
- Any retrieved context, documents, or knowledge base content.
- Conversation history, if the request includes prior turns.
- The user's actual input.
Convert words to tokens using the rule that 750 words is about 1,000 tokens. Most people drastically underestimate this number because they forget the context and history, which are often far larger than the user's actual message.
Step 3: Count the Output Tokens
Estimate how much text the model generates in a typical response. A one-line answer is around 30 tokens; a paragraph is 150; a full page is 500. If your application generates structured data or long reports, this number can dominate your bill, so estimate it carefully rather than guessing low.
Step 4: Pull the Per-Token Rates
From your provider's pricing page, record two numbers for your chosen model: the input price per million tokens and the output price per million tokens. If you have not chosen a model yet, grab the rates for two candidates — one small and one flagship — so you can compare in the next step. Our Complete Guide explains how the model families differ.
Step 5: Compute the Cost of One Request
Now combine the pieces:
- Input cost = (input tokens ÷ 1,000,000) × input price.
- Output cost = (output tokens ÷ 1,000,000) × output price.
- Per-request cost = input cost + output cost.
Write this number down. For most well-designed requests it lands somewhere between a tenth of a cent and a few cents. If yours is much higher, you have found a problem early — exactly the point of this exercise.
A worked example
Suppose your representative request sends 2,000 input tokens (instructions, a retrieved document, and the user's question) and produces a 300-token answer, on a model priced at $3 per million input and $15 per million output. Input cost is (2,000 ÷ 1,000,000) × $3 = $0.006. Output cost is (300 ÷ 1,000,000) × $15 = $0.0045. The per-request cost is about $0.0105 — roughly a cent. That single number, computed in under a minute, is the seed of your entire budget. If it had come out to ten cents instead, you would know immediately that either the model or the prompt size needs to change before you build anything.
Step 6: Scale to Monthly Volume
Multiply your per-request cost by realistic request volume:
- Per-request cost × requests per day × 30 = baseline monthly cost.
Use your honest expected volume, then compute a second number at 3x that volume as a stress test. AI usage tends to grow faster than teams expect once a feature is live, and a budget that only survives at launch volume is not a budget.
Step 7: Apply Discounts You Can Actually Use
The baseline assumes full price with no optimization. Now apply the discounts that fit your workload:
- Prompt caching: if a large part of your input repeats across requests (a stable system prompt or knowledge base), the repeated portion can be billed at 75 to 90 percent off. This often cuts the input side of your bill dramatically.
- Batch processing: if responses can wait minutes to hours, batch APIs run at roughly half price.
- Model downgrade: re-run steps 4 and 5 with a smaller model to see if quality holds.
Recompute your monthly cost with the applicable discounts. The gap between your Step 6 and Step 7 numbers is your optimization headroom. The pitfalls of skipping these are detailed in our Common Mistakes guide.
Step 8: Set a Budget and Instrument It
Take your Step 7 number, add a 30 percent buffer for the requests you forgot, and that is your initial monthly budget. Then make it observable:
- Log token counts and the model used on every request.
- Tag spend by feature so you know where money goes.
- Set an alert at 70 percent of budget.
A budget you cannot see in real time is a budget you will blow through. Our Checklist turns this monitoring setup into a tickable list.
One last habit: revisit these eight steps whenever the workload changes materially — a new model, a larger context, a spike in volume. The estimate you ran at launch decays as reality drifts from your assumptions, and a thirty-minute re-run is far cheaper than discovering the drift on an invoice.
Frequently Asked Questions
How accurate will my estimate be?
A careful estimate following these steps usually lands within 20 to 30 percent of reality, which is more than good enough to decide whether a design is affordable. The biggest source of error is underestimating input tokens — people forget context and history. Recount Step 2 if your real bill surprises you.
What if my requests vary wildly in size?
Compute the process twice — once for a small request and once for a large one — then estimate the mix. If 80 percent of requests are small and 20 percent are large, weight the two per-request costs accordingly. A single blended average works fine for budgeting.
Should I do this before or after picking a model?
Before, ideally. Running Step 5 for both a small and a flagship model is often what reveals that the cheaper model is affordable and the expensive one is not. Cost estimation should inform model choice, not just confirm it after the fact.
Where do most people's estimates go wrong?
Underestimating input tokens and forgetting volume growth. Context, retrieved documents, and conversation history are usually larger than the user's actual message, and live usage tends to climb faster than expected. The 3x stress test in Step 6 protects against both.
Do I need special software to do this?
No. A calculator and the provider's pricing page are enough for the estimate itself. You only need tooling for Step 8, where you instrument live spend — and most providers expose token counts in their API responses, so logging them is straightforward.
Key Takeaways
- Start from one representative request, not your best-case demo.
- Count input tokens carefully — context and history usually dwarf the user's message.
- Compute per-request cost, then scale by volume and stress-test at 3x.
- Apply caching, batching, and model downgrades to find your real budget.
- Add a 30 percent buffer and instrument spend so the budget stays visible.
- The whole process takes under an hour and prevents the classic first-invoice shock.