Most teams treat open vs closed as a one-time decision: a meeting, a slide, a verdict, done. That's why so many of them are stuck a year later running the wrong model on the wrong workload with no plan to move. The model question is not a decision, it's a portfolio you manage over time as costs shift, volume grows, and requirements change.
A playbook is different from a guide. A guide explains the concepts; a playbook tells you which move to make when a specific condition is true, who makes it, and what comes next. Below are the plays, the triggers that should fire each one, the owner accountable, and the sequence that keeps you from running them out of order. If you need the conceptual grounding first, read the Complete Guide to Open vs Closed Source AI Models, then come back here to operationalize it.
The Plays at a Glance
There are five plays. You won't run them all, and you definitely won't run them simultaneously. Each has a trigger and an owner.
- Play 1 — Validate Closed: ship on a closed API to prove the product works.
- Play 2 — Instrument Cost: measure real token volume and spend before deciding anything.
- Play 3 — Pilot Open: stand up an open model on your highest-volume, lowest-risk workload.
- Play 4 — Route Hybrid: split traffic by risk and cost across both.
- Play 5 — Consolidate: kill the system you no longer need.
The discipline is in the triggers. Running Play 3 before Play 2 is how teams burn a quarter self-hosting a workload that was never expensive enough to matter.
Play 1: Validate Closed
Trigger: You have an unproven product idea and no production AI in the loop yet.
Owner: Product engineering lead.
Start closed, always, unless a hard regulatory rule forbids data leaving your perimeter. The goal here is speed to a working product, not cost optimization. You're answering "does this feature even work and do users want it?" A closed API removes every infrastructure question from that experiment.
What done looks like
- The feature is in production behind a feature flag.
- You've abstracted model calls behind a single internal interface, so no SDK call is scattered across the codebase.
- You have a baseline eval set of real inputs and acceptable outputs.
That abstraction layer is the most important deliverable of this play. It's what makes every later play possible. Skip it and you've quietly committed to your first vendor forever.
Play 2: Instrument Cost
Trigger: The closed-API feature is live and getting real traffic.
Owner: Engineering lead with finance visibility.
You cannot make a sound open vs closed call without numbers, and most teams argue the decision on vibes. This play installs the meter.
What to measure
- Tokens in and out per request, aggregated daily and by feature.
- Cost per task and cost per active user.
- The shape of the curve: is volume flat, linear, or compounding?
Run this for at least a few weeks of representative traffic. The output is a single answer: are we anywhere near the volume where self-hosting could pay off? If you're spending a few hundred dollars a month, the answer is no, and you skip straight to maintenance. The Real-World Examples and Use Cases piece shows what these curves look like for different product shapes.
Play 3: Pilot Open
Trigger: Instrumented spend is high and steady, and you have a workload that is high-volume and low-risk.
Owner: A dedicated infra or ML engineer, not a side project.
Now, and only now, you pilot an open model. Pick the single worst offender on your cost report that also has the lowest risk profile, usually a classification, extraction, or summarization task. Don't pilot open on your flagship reasoning feature; pilot it where being slightly worse is invisible to users.
Run it as a real experiment
- Stand up the open model behind the same abstraction interface.
- Run your eval set against both models and compare quality head to head.
- Shadow-test: send a slice of real traffic to both, log both, compare, but only serve the closed result until you trust the open one.
The failure mode here is declaring victory on benchmark scores. Benchmarks aren't your workload. Trust the shadow test on real inputs.
Play 4: Route Hybrid
Trigger: The open pilot passes quality bars on its target workload and the cost delta is meaningful.
Owner: Platform team.
This is the steady state most mature teams land in. You route by two axes: cost and risk.
The routing rules
- High-volume, low-risk, quality-tolerant tasks go to the self-hosted open model.
- Frontier reasoning, sensitive data with strict contracts, and anything user-facing-and-hard goes to the closed model.
- A fallback path sends open-model failures to the closed model automatically.
The cost of this play is real: you now operate two systems, two eval suites, and routing logic that can itself break. Budget for that complexity. For teams below serious scale, the hybrid tax isn't worth it, which is why Play 4 has a trigger and isn't the default. Best Practices That Actually Work goes deeper on keeping a hybrid system maintainable.
Play 5: Consolidate
Trigger: One side of your hybrid setup has shrunk to a rounding error, or a market shift changed the math.
Owner: Engineering lead.
Plays accumulate cruft. If closed API prices drop far enough, your self-hosted savings can evaporate and the right move is to consolidate back to closed and shut down the GPUs. If an open model leaps forward, you might consolidate the other way. This play exists to make sure someone is empowered to delete a system, because nobody ever wants to admit a past decision expired. Review the portfolio quarterly and be willing to run this play.
Sequencing and Ownership
The order is non-negotiable: validate, instrument, pilot, route, consolidate. The most common organizational failure is jumping to Play 3 or 4 because self-hosting feels sophisticated, before Play 2 has proven there's anything to optimize. Assign one accountable owner per play and put the quarterly portfolio review on a recurring calendar invite. A model strategy without a standing owner decays into whatever the loudest engineer prefers.
Frequently Asked Questions
How is a playbook different from just having a strategy?
A strategy states intent; a playbook states the conditional moves. The value is in the triggers, knowing that "spend is high and steady" fires the open pilot and nothing else does. It turns a debate into a checklist that anyone on the team can execute.
What if regulation forces open from day one?
Then you run a modified Play 1 where "validate" happens on a self-hosted open model inside your perimeter. You lose the speed advantage of closed, so budget extra time for the prototype. The rest of the sequence still applies.
Who should own the model portfolio overall?
A single engineering leader with finance visibility, because the decision is half technical and half cost. Distributing it across teams guarantees inconsistent choices and no one running Play 5 when a system should die.
How often should we re-run the plays?
Instrument continuously, review the portfolio quarterly, and re-pilot whenever a major model release or a significant price change moves the break-even point. The math here expires faster than most architectural decisions.
Can a small team skip straight to hybrid?
Almost never. Hybrid's operational tax only pays off at scale. Small teams should live in Plays 1 and 2 and resist the pull toward complexity until the cost data demands it.
Key Takeaways
- Treat open vs closed as a managed portfolio, not a one-time verdict.
- The five plays run in strict order: validate closed, instrument cost, pilot open, route hybrid, consolidate.
- Triggers are the whole point; never pilot open before cost data proves there's something to save.
- Build the model abstraction layer in Play 1 or every later play becomes impossible.
- Assign one accountable owner and review the portfolio quarterly, because the math expires.