Seven Ways Structured Output Quietly Breaks in Production

Structured output rarely fails loudly. It fails the way a slow leak does—everything looks fine in testing, the demo is flawless, and then one record in a thousand arrives malformed and corrupts a downstream process you forgot was connected. By the time you notice, the bad data has propagated.

The frustrating part is that the failure modes are not exotic. They are the same handful of mistakes, made over and over, by teams who assumed the model's good behavior in development would hold under real traffic. This piece names seven of them. For each, we explain why it happens, what it actually costs, and the corrective practice that closes the gap.

If you are building a pipeline now, read this alongside the step-by-step approach—these mistakes map directly onto the steps people skip.

Mistake 1: Trusting That Valid JSON Means Correct Data

The most common error is assuming that because the output parsed, it is right. JSON mode guarantees syntax, nothing more. An empty object, an object with the wrong fields, or a perfectly-formed object full of nonsense values all parse cleanly.

Why it happens: "It parsed" feels like success, so the validation step feels redundant.

What it costs: Garbage flows into your database wearing the disguise of clean data. The bug surfaces far downstream, where it is expensive to trace back.

The fix: Always validate against the schema and your business rules after parsing. Parsing and validating are two different jobs, as the complete guide lays out.

Mistake 2: Letting the Schema and Validator Drift Apart

Teams often write the schema they send to the model in one place and the validation logic in another, by hand. Over months, someone updates one and forgets the other.

Why it happens: The model instruction and the validator feel like different concerns owned by different parts of the code.

What it costs: The model produces exactly what its schema asked for, your validator rejects it because its copy is out of date, and you waste hours debugging a phantom model problem that is really a sync problem.

The fix: Derive both from a single schema definition. One source of truth, validator and instruction generated from it.

Mistake 3: Skipping Field Descriptions

A schema with bare types—status: string, amount: number—tells the model the shape but not the meaning. The model guesses at intent and guesses wrong on edge cases.

Why it happens: Types alone make the schema look complete, so descriptions feel optional.

What it costs: Subtle, hard-to-reproduce content errors. The model puts plausible-but-wrong values in fields, and because the structure is correct, validation passes them through.

The fix: Describe every field, especially enums and ambiguous cases. Tell the model exactly when to pick each option.

Mistake 4: No Retry Strategy

The pipeline works, so nobody builds a recovery path. The first malformed response throws an unhandled exception.

Why it happens: During development the model never misbehaves, so the failure path never gets exercised or built.

What it costs: A single bad response takes down the request, and over enough volume bad responses are inevitable. What should have been a recoverable blip becomes an outage.

The fix: Wrap the call in a bounded retry loop that feeds the validation error back to the model. Define a fallback for exhausted retries.

Mistake 5: Asking for Too Many Fields at Once

Bigger schemas feel more powerful, so teams cram every conceivable field into one request.

Why it happens: It seems efficient to get everything in a single call.

What it costs: Each additional field is another opportunity for error, and large schemas dilute the model's attention. Accuracy on every field drops as the field count rises, and the response costs more tokens.

The fix: Request the minimum your application uses. If you need many fields, consider splitting into focused calls. The best practices guide covers scoping schemas well.

Mistake 6: Ignoring Optional Versus Required

Marking everything required—or everything optional—both cause problems. All-required forces the model to invent values it does not have. All-optional lets it omit fields your code assumes exist.

Why it happens: Thinking carefully about which fields can legitimately be absent is tedious, so people pick one extreme.

What it costs: All-required produces hallucinated filler. All-optional produces missing-key errors deep in code that assumed the key was there.

The fix: Decide deliberately for each field. A field is required only if the model can always determine it from the input. Otherwise make it optional and handle absence explicitly.

Mistake 7: Trusting the Model With Numbers and Dates

Models are language predictors, not calculators. Teams ask them to sum line items, compute a date difference, or convert currencies inside the structured output.

Why it happens: It is tempting to offload arithmetic to the same call that does extraction.

What it costs: Subtly wrong totals and off-by-one dates that look right and pass validation, then surface as accounting discrepancies or scheduling bugs.

The fix: Have the model extract the raw inputs—the line items, the two dates—and do the arithmetic in your own code where it is deterministic and testable.

The Pattern Behind All Seven

Step back and every mistake shares a root cause: treating the model's output as trustworthy by default. The teams that avoid these problems adopt the opposite posture. They assume the output is suspect until proven otherwise, they keep humans-or-code in the loop for anything the model is bad at, and they build the failure paths before they are needed.

That mindset—untrusted input, defensive handling—is the throughline of every reliable structured-output system. The specific fixes above are just that posture made concrete.

Frequently Asked Questions

If schema enforcement is so strict, why do I still see bad data?

Because enforcement controls structure, not meaning. The model can return a value that is the correct type and passes every structural check while still being factually or logically wrong for your use case. Enforcement guarantees the container; it cannot guarantee the contents make sense, which is why semantic validation remains essential.

Which of these mistakes is most expensive in practice?

Trusting valid JSON to mean correct data, because it is the most invisible. The data looks clean, flows into storage, and the error only surfaces when something downstream produces a wrong result. By then the root cause is buried under several layers of processing, making it the hardest to trace and the costliest to fix.

How do I know if my schema is too large?

Watch your accuracy per field as you add fields. If accuracy on existing fields drops when you add new ones, the schema is competing for the model's attention. A good sign is when every field in the schema is actually consumed by your application; speculative fields you might use someday are the first thing to cut.

Should the model ever do math in structured output?

No. Treat the model as an extractor, not a calculator. Have it pull the raw numbers and dates into fields, then compute totals, differences, and conversions in your own deterministic code. Model arithmetic is plausible-looking and intermittently wrong, which is the worst combination because it passes casual review.

How do I decide if a field should be required?

Make a field required only if the model can always determine its value from the input every single time. If there are legitimate inputs where the value genuinely does not exist, the field must be optional, and your code must handle its absence. Forcing required on a sometimes-absent field is what produces hallucinated filler values.

Key Takeaways

Structured output usually fails intermittently, so problems hide until they corrupt something downstream.
Valid JSON is not correct data; always validate semantics after parsing.
Keep the schema and validator in sync by deriving both from one source.
Write field descriptions, request only the fields you use, and decide required-versus-optional deliberately.
Never let the model do arithmetic; extract raw values and compute in deterministic code.

If you are building a pipeline now, read this alongside the step-by-step approach—these mistakes map directly onto the steps people skip.

Mistake 1: Trusting That Valid JSON Means Correct Data

Why it happens: "It parsed" feels like success, so the validation step feels redundant.

What it costs: Garbage flows into your database wearing the disguise of clean data. The bug surfaces far downstream, where it is expensive to trace back.

The fix: Always validate against the schema and your business rules after parsing. Parsing and validating are two different jobs, as the complete guide lays out.

Mistake 2: Letting the Schema and Validator Drift Apart

Teams often write the schema they send to the model in one place and the validation logic in another, by hand. Over months, someone updates one and forgets the other.

Why it happens: The model instruction and the validator feel like different concerns owned by different parts of the code.

The fix: Derive both from a single schema definition. One source of truth, validator and instruction generated from it.

Mistake 3: Skipping Field Descriptions

A schema with bare types—status: string, amount: number—tells the model the shape but not the meaning. The model guesses at intent and guesses wrong on edge cases.

Why it happens: Types alone make the schema look complete, so descriptions feel optional.

What it costs: Subtle, hard-to-reproduce content errors. The model puts plausible-but-wrong values in fields, and because the structure is correct, validation passes them through.

The fix: Describe every field, especially enums and ambiguous cases. Tell the model exactly when to pick each option.

Mistake 4: No Retry Strategy

The pipeline works, so nobody builds a recovery path. The first malformed response throws an unhandled exception.

Why it happens: During development the model never misbehaves, so the failure path never gets exercised or built.

What it costs: A single bad response takes down the request, and over enough volume bad responses are inevitable. What should have been a recoverable blip becomes an outage.

The fix: Wrap the call in a bounded retry loop that feeds the validation error back to the model. Define a fallback for exhausted retries.

Mistake 5: Asking for Too Many Fields at Once

Bigger schemas feel more powerful, so teams cram every conceivable field into one request.

Why it happens: It seems efficient to get everything in a single call.

The fix: Request the minimum your application uses. If you need many fields, consider splitting into focused calls. The best practices guide covers scoping schemas well.

Mistake 6: Ignoring Optional Versus Required

Marking everything required—or everything optional—both cause problems. All-required forces the model to invent values it does not have. All-optional lets it omit fields your code assumes exist.

Why it happens: Thinking carefully about which fields can legitimately be absent is tedious, so people pick one extreme.

What it costs: All-required produces hallucinated filler. All-optional produces missing-key errors deep in code that assumed the key was there.

The fix: Decide deliberately for each field. A field is required only if the model can always determine it from the input. Otherwise make it optional and handle absence explicitly.

Mistake 7: Trusting the Model With Numbers and Dates

Models are language predictors, not calculators. Teams ask them to sum line items, compute a date difference, or convert currencies inside the structured output.

Why it happens: It is tempting to offload arithmetic to the same call that does extraction.

What it costs: Subtly wrong totals and off-by-one dates that look right and pass validation, then surface as accounting discrepancies or scheduling bugs.

The fix: Have the model extract the raw inputs—the line items, the two dates—and do the arithmetic in your own code where it is deterministic and testable.

The Pattern Behind All Seven

That mindset—untrusted input, defensive handling—is the throughline of every reliable structured-output system. The specific fixes above are just that posture made concrete.

Frequently Asked Questions

If schema enforcement is so strict, why do I still see bad data?

Which of these mistakes is most expensive in practice?

How do I know if my schema is too large?

Should the model ever do math in structured output?

How do I decide if a field should be required?

Key Takeaways

Structured output usually fails intermittently, so problems hide until they corrupt something downstream.
Valid JSON is not correct data; always validate semantics after parsing.
Keep the schema and validator in sync by deriving both from one source.
Write field descriptions, request only the fields you use, and decide required-versus-optional deliberately.
Never let the model do arithmetic; extract raw values and compute in deterministic code.

Seven Ways Structured Output Quietly Breaks in Production

Mistake 1: Trusting That Valid JSON Means Correct Data

Mistake 2: Letting the Schema and Validator Drift Apart

Mistake 3: Skipping Field Descriptions

Mistake 4: No Retry Strategy

Mistake 5: Asking for Too Many Fields at Once

Mistake 6: Ignoring Optional Versus Required

Mistake 7: Trusting the Model With Numbers and Dates

The Pattern Behind All Seven

Frequently Asked Questions

If schema enforcement is so strict, why do I still see bad data?

Which of these mistakes is most expensive in practice?

How do I know if my schema is too large?

Should the model ever do math in structured output?

How do I decide if a field should be required?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Seven Ways Structured Output Quietly Breaks in Production

Mistake 1: Trusting That Valid JSON Means Correct Data

Mistake 2: Letting the Schema and Validator Drift Apart

Mistake 3: Skipping Field Descriptions

Mistake 4: No Retry Strategy

Mistake 5: Asking for Too Many Fields at Once

Mistake 6: Ignoring Optional Versus Required

Mistake 7: Trusting the Model With Numbers and Dates

The Pattern Behind All Seven

Frequently Asked Questions

If schema enforcement is so strict, why do I still see bad data?

Which of these mistakes is most expensive in practice?

How do I know if my schema is too large?

Should the model ever do math in structured output?

How do I decide if a field should be required?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?