When you walk into a budget meeting, nobody wants to hear about parameter counts or licensing philosophy. The decision-maker wants three numbers: what it costs, what it returns, and how long until it pays back. If you cannot produce those, the project does not get funded — regardless of how good the technology is.
This guide shows how to build the financial case for open versus closed models the way a finance team will actually evaluate it. The headline: per-token pricing is the smallest part of the story, and teams that lead with it lose the room.
The Two Cost Curves
The fundamental ROI difference is the shape of the cost curve. Understand this and the rest follows.
Closed: variable cost, zero fixed cost
Closed APIs are pure pay-per-use. No GPUs to buy, no engineers to staff for inference, no idle capacity. Your cost rises linearly with usage. This is ideal when volume is low, spiky, or uncertain — you pay only for what you use and the business case is trivial to model.
Open: fixed cost, low marginal cost
Self-hosted open models invert it. You pay for GPUs and engineering whether you serve one request or a million. Once that fixed cost is covered, each additional request is nearly free. This wins decisively at high, steady volume — and loses badly at low utilization, where you pay for idle hardware.
The crossover point is the entire business case. Below it, closed is cheaper. Above it, open is. Your job is to find where your volume sits relative to that line. The metrics guide shows how to calculate effective per-token cost for a self-hosted setup.
Building Total Cost of Ownership
A credible business case prices everything, not just inference.
Closed model TCO
- Per-token API spend at projected volume, with headroom for growth
- Engineering time for integration and prompt work (modest)
- Vendor risk premium — price-change and rate-limit exposure
Open model TCO
- GPU cost: rented hourly or amortized hardware
- Utilization penalty: idle GPUs are pure waste; model realistic load
- Engineering: inference infrastructure, scaling, monitoring, on-call
- Fine-tuning and experimentation compute
The honest open TCO is almost always higher than the napkin math suggests, because idle time and ops are easy to forget and expensive to ignore. The risks article details the operational costs people overlook.
Quantifying the Benefit Side
ROI is return over cost, so the return needs a number too.
Where the value comes from
- Labor displaced or augmented: Hours saved per task times tasks per month times loaded cost per hour.
- Revenue enabled: New features, faster turnaround, or higher capacity that the model unlocks.
- Cost avoided: Cheaper inference replacing a more expensive approach.
For most teams, the benefit is dominated by labor and revenue, not by shaving cents off inference. This is why leading with per-token cost is a mistake — the savings there are usually a rounding error next to the value the model produces.
Payback Period and the Decision
The simple model
Payback period equals fixed investment divided by monthly net savings. A closed approach often has near-zero payback because there is no upfront investment — you are profitable from request one if the value exceeds the per-token cost. An open approach has a real payback period because you front-load GPU and engineering cost, then recover it through low marginal cost at volume.
How to present it
Show both options side by side over a realistic time horizon — say 12 to 24 months — with your actual projected volume. Plot total cost for each. The decision-maker can see exactly where and whether open overtakes closed. This single chart wins more approvals than any amount of technical argument.
The Risk-Adjusted View
Finance teams discount uncertain returns. Acknowledge the risks explicitly: closed exposes you to vendor price changes and lock-in; open exposes you to execution risk if you cannot staff the ops. A common winning recommendation is to start closed for fast, low-risk validation, then migrate high-volume workloads to open once the volume justifies the investment — captured cleanly in the step-by-step migration guide.
A Worked Numerical Example
Abstract cost curves convince no one; a concrete example does. Walk a decision-maker through a realistic scenario rather than a formula.
The setup
Imagine a workload of steady high volume — millions of tokens per day — running a task a strong open model handles well. Lay out two columns. The closed column has zero fixed cost and a per-token rate that, multiplied by monthly volume, produces a monthly bill that scales directly with usage. The open column has a fixed monthly cost: GPU rental sized for the load, plus a fraction of an engineer's time for operations, plus some idle-capacity waste.
Reading the result
At low volume, the closed column is far cheaper because the open column is paying for mostly idle GPUs. As volume climbs, the closed bill rises linearly while the open cost stays roughly flat — until at some monthly volume the two lines cross. Past that point, every additional unit of volume widens open's advantage. The decision-maker can see precisely where their projected volume sits relative to that crossover, and whether the gap is large enough to justify the execution risk of self-hosting. This single comparison, with your real numbers, persuades better than any qualitative argument about openness.
Presenting to the Decision-Maker
When you bring this to a budget meeting, lead with the payback period and the value created, show the crossover chart as supporting evidence, and name the risks explicitly so finance does not feel sold to. Acknowledge that the open estimate depends on hitting your utilization assumption, and that the closed estimate depends on the provider not raising prices. Decision-makers trust a recommendation that names its own weaknesses far more than one that claims certainty. The for-teams guide covers how to keep this analysis consistent across an organization.
Beyond Cost: The Strategic Value Lines
A complete business case includes value that does not fit neatly in a cost cell but matters to decision-makers.
Optionality and lock-in
A provider-agnostic architecture has real financial value even before you switch anything: it caps your exposure to a single vendor's price hikes and gives you negotiating leverage. Frame this as risk reduction. A decision-maker who has lived through a surprise vendor price increase understands immediately why paying a small premium for optionality is worth it.
Compliance as an enabler, not just a cost
For regulated workloads, self-hosting open models is sometimes the only way to enter a market or close a deal that requires data to stay on controlled infrastructure. Here the open option is not a cost optimization — it is a revenue enabler, because it unlocks business that a closed API would disqualify you from. That reframing can flip an ROI case that looked marginal on inference cost alone. The risks article covers the compliance dimension in detail.
Frequently Asked Questions
At what volume does open become cheaper than closed?
There is no universal number, but the crossover is driven by GPU utilization. Open typically wins only at high, steady volume — often millions of tokens per day with consistent load. At low or spiky volume, idle GPU cost makes open more expensive. Model your specific utilization to find the line.
Should I lead the business case with per-token cost savings?
No. For most use cases the inference cost is small next to the labor saved or revenue enabled. Lead with the value the model produces and the payback period, then treat per-token cost as a secondary detail. Finance cares about total return, not unit pricing.
How do I account for engineering time in open model ROI?
Treat inference infrastructure, scaling, monitoring, and on-call as ongoing operational cost, loaded at fully burdened salary rates. This is the most commonly underestimated line item and often the deciding factor that tips a marginal case back toward closed.
What is the safest ROI strategy?
Start closed to validate value with near-zero upfront cost and fast time-to-market. Once a workload proves high and steady volume, build the business case to migrate it to open and capture the marginal-cost advantage. This sequences risk sensibly.
Key Takeaways
- Closed has variable cost and zero fixed cost; open inverts it. The crossover point is the whole case.
- Price full TCO, including GPU idle time and engineering, not just inference.
- Lead the case with value created and payback period, not per-token savings.
- A side-by-side cost chart over 12 to 24 months wins approvals.
- Start closed to validate, migrate high-volume workloads to open when volume justifies it.