The Pre-Launch Modality Checklist Worth Printing

A checklist is only useful if you actually trust it enough to follow it under deadline pressure. The list below is built for that moment: you have a feature that accepts a photo, an audio clip, or a document, or that returns structured data, and you are about to ship it. Run these items first.

Every item here addresses a real, recurring way that ai model input and output modalities trip teams up. None of them are busywork. Each carries a short justification so you know why it matters and can judge when an item genuinely does not apply to your case. Treat unjustified deviations as risk you are accepting on purpose.

Use it as a working tool, not a reading exercise. Copy the items into your release process, and make passing them a gate rather than a suggestion. The whole point is to convert hard-won lessons into a routine you do not have to rediscover each time.

Before You Build

Confirm input and output modalities separately

[ ] Verify the model accepts every input modality you will send. Input support is never guaranteed; confirm it in the docs.
[ ] Verify the model produces every output modality you need. Output support is separate from input; a model that reads images may not generate them.

These two checks prevent the most expensive early mistake. Our step-by-step guide makes them step one for the same reason.

Scope the modality set

[ ] List the minimum modalities that solve the user's job. Every extra modality adds cost, latency, and failure modes.
[ ] Reject any modality that exists only to impress. Novelty is not a user need; cut it.

Before You Test

Prepare realistic inputs

[ ] Assemble a corpus of worst-case inputs. Blurry photos, noisy audio, oddly formatted documents, whatever you will actually receive.
[ ] Make passing the worst-case corpus a release gate. Clean inputs hide the failures that matter, as our common-mistakes article details.

Define the output contract

[ ] Decide whether a human or software consumes the output. This determines whether you need structured output or prose.
[ ] Define a strict schema if software consumes it. A predictable shape prevents fragile downstream parsing.

Before You Measure

Budget per modality

[ ] Measure cost on realistic, production-sized requests. Image and video cost scales with density; toy examples lie.
[ ] Cap input size and count where cost scales with them. Uncapped rich inputs produce surprise invoices.
[ ] Measure latency for every non-text output. Generated images and speech add seconds, not milliseconds.

The reasoning behind these multipliers lives in our definitive guide.

Before You Ship

Validate and fall back

[ ] Validate every output at the boundary. Treat model output as untrusted until proven otherwise.
[ ] Define an explicit fallback for failed validation. Retry, default, or surfaced error; never silent acceptance.
[ ] Route low-confidence results to a human where stakes are high. A safety net is what makes automation acceptable.

Keep it maintainable

[ ] Isolate each modality behind a clean boundary. Modular handling makes later expansion and debugging cheap.
[ ] Background any slow, non-essential output. Never block the main interaction on a slow modality.

After You Ship

The checklist does not end at launch. The most damaging modality problems often appear only when real traffic arrives in volume and variety, so a few post-launch checks keep a working feature working.

Monitor the things that drift

[ ] Track the rate of validation failures over time. A rising failure rate signals that real inputs are drifting away from what you tested.
[ ] Watch per-modality cost as volume grows. A feature that was affordable at launch can become unsustainable as users send more or larger inputs.
[ ] Sample the human-fallback queue regularly. The cases that land there tell you exactly where the model is weakest and where to invest next.

Expand only on evidence

[ ] Add new modalities only after usage data justifies them. Post-launch is when you learn which expansions users actually need, rather than guessing.
[ ] Re-run the worst-case corpus after any model or prompt change. A change that improves one path can quietly regress another.

These post-launch items are what separate a feature that degrades silently from one that stays reliable. The same discipline that built it has to maintain it, and the best-practices article frames this as treating modality as an ongoing constraint rather than a one-time decision.

How to Use the Checklist

The list is ordered to match the natural sequence of a build: capability checks before you write code, input and contract decisions before you test, cost and latency measurement before you commit, and validation before you ship. Working top to bottom means each expensive decision rests on cheaper checks you already passed.

Do not treat every box as equally weighted. The four that prevent the most damage are: confirming output support separately, gating on worst-case inputs, defining a strict schema, and pairing validation with a fallback. If you are short on time, those four are non-negotiable. The rest sharpen a feature that is already sound. For the deeper reasoning behind each, the best-practices article expands every item into its full argument.

A second way to use the list is as a shared language during review. When two people disagree about whether a feature is ready, walking the checklist together turns a vague argument into a series of concrete yes-or-no questions. Either the worst-case corpus exists or it does not; either validation has a defined fallback or it does not. This removes most of the subjectivity from "is it ready?" and replaces it with evidence anyone on the team can verify.

Finally, keep the list living. Every time a feature breaks in a way the checklist did not catch, add an item that would have caught it. Over a few projects, the list stops being generic advice and becomes a record of your own team's hard-won lessons, which is when it earns the most trust. The underlying decisions it encodes are derived from the HAVE-NEED-BRIDGE framework, so the two tools reinforce each other: the framework chooses the modalities, and the checklist confirms you wired them safely.

Frequently Asked Questions

Why check input and output support as separate items?

Because they are separate capabilities. A model that accepts a modality may not produce it, and conflating the two is the most common early-stage mistake. Two distinct checks force you to confirm both directions explicitly.

Which checklist items are truly mandatory?

Confirming output support, gating on worst-case inputs, defining a strict output schema, and pairing validation with a fallback. These four prevent the failures that cost the most. The remaining items improve a feature that is already fundamentally sound.

How big should my worst-case input corpus be?

Large enough to cover the real range of quality you will receive, with emphasis on the genuinely bad examples. A dozen carefully chosen messy inputs that mirror production often catch more problems than a hundred clean ones.

Do I need structured output for every feature?

No. Only when software consumes the result. If a human reads the output directly, prose is appropriate. The checklist asks you to decide the consumer first precisely so you apply the schema requirement only where it belongs.

Can I skip cost measurement for a text-only feature?

You can be lighter on it, since text is cheap and predictable. But the moment any image, audio, or video enters the input, measure cost on realistic requests, because those modalities carry multipliers that text does not.

Key Takeaways

Confirm input and output modality support as two separate checks before building.
Scope to the minimum modality set and gate on a worst-case input corpus.
Decide your output consumer first, then apply a strict schema wherever software reads the result.
Measure cost and latency per modality on realistic requests, and cap rich inputs.
Validate every output, pair it with a defined fallback, and keep modality handling modular.

Before You Build

Confirm input and output modalities separately

[ ] Verify the model accepts every input modality you will send. Input support is never guaranteed; confirm it in the docs.
[ ] Verify the model produces every output modality you need. Output support is separate from input; a model that reads images may not generate them.

These two checks prevent the most expensive early mistake. Our step-by-step guide makes them step one for the same reason.

Scope the modality set

[ ] List the minimum modalities that solve the user's job. Every extra modality adds cost, latency, and failure modes.
[ ] Reject any modality that exists only to impress. Novelty is not a user need; cut it.

Before You Test

Prepare realistic inputs

[ ] Assemble a corpus of worst-case inputs. Blurry photos, noisy audio, oddly formatted documents, whatever you will actually receive.
[ ] Make passing the worst-case corpus a release gate. Clean inputs hide the failures that matter, as our common-mistakes article details.

Define the output contract

[ ] Decide whether a human or software consumes the output. This determines whether you need structured output or prose.
[ ] Define a strict schema if software consumes it. A predictable shape prevents fragile downstream parsing.

Before You Measure

Budget per modality

[ ] Measure cost on realistic, production-sized requests. Image and video cost scales with density; toy examples lie.
[ ] Cap input size and count where cost scales with them. Uncapped rich inputs produce surprise invoices.
[ ] Measure latency for every non-text output. Generated images and speech add seconds, not milliseconds.

The reasoning behind these multipliers lives in our definitive guide.

Before You Ship

Validate and fall back

[ ] Validate every output at the boundary. Treat model output as untrusted until proven otherwise.
[ ] Define an explicit fallback for failed validation. Retry, default, or surfaced error; never silent acceptance.
[ ] Route low-confidence results to a human where stakes are high. A safety net is what makes automation acceptable.

Keep it maintainable

[ ] Isolate each modality behind a clean boundary. Modular handling makes later expansion and debugging cheap.
[ ] Background any slow, non-essential output. Never block the main interaction on a slow modality.

After You Ship

Monitor the things that drift

[ ] Track the rate of validation failures over time. A rising failure rate signals that real inputs are drifting away from what you tested.
[ ] Watch per-modality cost as volume grows. A feature that was affordable at launch can become unsustainable as users send more or larger inputs.
[ ] Sample the human-fallback queue regularly. The cases that land there tell you exactly where the model is weakest and where to invest next.

Expand only on evidence

[ ] Add new modalities only after usage data justifies them. Post-launch is when you learn which expansions users actually need, rather than guessing.
[ ] Re-run the worst-case corpus after any model or prompt change. A change that improves one path can quietly regress another.

How to Use the Checklist

Frequently Asked Questions

Why check input and output support as separate items?

Which checklist items are truly mandatory?

How big should my worst-case input corpus be?

Do I need structured output for every feature?

Can I skip cost measurement for a text-only feature?

Key Takeaways

Confirm input and output modality support as two separate checks before building.
Scope to the minimum modality set and gate on a worst-case input corpus.
Decide your output consumer first, then apply a strict schema wherever software reads the result.
Measure cost and latency per modality on realistic requests, and cap rich inputs.
Validate every output, pair it with a defined fallback, and keep modality handling modular.

The Pre-Launch Modality Checklist Worth Printing

Before You Build

Confirm input and output modalities separately

Scope the modality set

Before You Test

Prepare realistic inputs

Define the output contract

Before You Measure

Budget per modality

Before You Ship

Validate and fall back

Keep it maintainable

After You Ship

Monitor the things that drift

Expand only on evidence

How to Use the Checklist

Frequently Asked Questions

Why check input and output support as separate items?

Which checklist items are truly mandatory?

How big should my worst-case input corpus be?

Do I need structured output for every feature?

Can I skip cost measurement for a text-only feature?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

The Pre-Launch Modality Checklist Worth Printing

Before You Build

Confirm input and output modalities separately

Scope the modality set

Before You Test

Prepare realistic inputs

Define the output contract

Before You Measure

Budget per modality

Before You Ship

Validate and fall back

Keep it maintainable

After You Ship

Monitor the things that drift

Expand only on evidence

How to Use the Checklist

Frequently Asked Questions

Why check input and output support as separate items?

Which checklist items are truly mandatory?

How big should my worst-case input corpus be?

Do I need structured output for every feature?

Can I skip cost measurement for a text-only feature?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?