Structured Output in the Wild: Six Scenarios Dissected

Principles only become intuition once you see them play out in specifics. This piece walks through six distinct scenarios where structured output was the right tool, what the schema looked like, and the one design decision that determined whether the result was usable. Some succeeded, some stumbled first—both are instructive.

The scenarios span the three jobs structured output does most: pulling data out of messy text, sorting inputs into fixed categories, and deciding which function to call. Across all six, you will notice the same handful of choices recurring, which is the point. The patterns generalize even though the domains do not.

For the underlying principles behind these decisions, the best practices guide is the reference. Here we stay concrete.

Scenario 1: Parsing Resumes Into Candidate Records

A recruiting team needed to turn uploaded resumes—every one formatted differently—into uniform records with name, years of experience, skills, and most recent role.

What Worked

The winning move was an optional-by-default schema. Resumes vary wildly; many omit graduation dates or list no explicit "years of experience." When the team initially marked these fields required, the model invented values to fill them, producing confident fabrications. Switching the uncertain fields to optional, with descriptions like "leave null if not stated," eliminated the hallucinated filler instantly.

The lesson: when input completeness varies, required fields force the model to lie. Optionality is honesty.

Scenario 2: Classifying Support Tickets

A support team wanted incoming tickets sorted into one of five categories plus an urgency level, to route them automatically.

What Worked

The schema used a strict enum for category and a separate enum for urgency. The decisive detail was the field descriptions. The first version had bare enums and the model scattered tickets across categories inconsistently. Adding precise descriptions—"choose 'billing' only for payment or invoice issues, not for account access"—lifted consistency sharply.

What Stumbled

Initially the team also asked the model to return a free-text "summary" in the same call. That field's quality was fine, but mixing a generative field with classification fields slightly degraded the classification accuracy, because the model's attention split. Removing the summary into a separate call restored classification quality. The common mistakes piece covers this attention-dilution effect.

Scenario 3: Extracting Line Items From Invoices

A finance workflow needed to read scanned invoices and produce a list of line items with description, quantity, unit price, and a total.

What Worked

Extraction of the raw fields—description, quantity, unit price—was reliable. The schema modeled line items as an array of objects, which the model handled cleanly.

What Stumbled

The team's first version asked the model to also compute the total per line and the grand total. The arithmetic was intermittently wrong: a quantity of 3 at 12.50 came back as 36.50 often enough to matter. The fix was to drop the total fields from the model's job entirely and compute every total in application code from the extracted quantities and prices. Models extract; code calculates.

Scenario 4: Routing With Function Calling

A travel assistant needed to decide, per user message, whether to search flights, search hotels, or ask a clarifying question—and produce the arguments for whichever it chose.

What Worked

This is structured output as control flow. Each function was a schema; the model selected one and filled its arguments. The crucial design choice was including an explicit "ask clarification" function as a first-class option. Without it, the model forced ambiguous requests into a flight or hotel search with guessed arguments. Giving it a legitimate "I am not sure" path stopped the bad guesses.

The general principle: when the model must choose, give it an escape hatch for uncertainty, or it will fake confidence.

Scenario 5: Normalizing Free-Text Survey Responses

A research team had thousands of open-ended survey answers and wanted each tagged with sentiment and the themes it touched.

What Worked

Sentiment as a three-value enum was trivially reliable. Themes were harder because the set was open-ended. The team's solution was a hybrid: a fixed enum of known themes plus an optional free-text "other_theme" field for responses that did not fit. This balanced consistency (most responses hit the enum) with coverage (novel themes were not lost).

The Broader Lesson

A pure enum would have discarded genuine new themes; pure free text would have produced inconsistent tags impossible to aggregate. The hybrid captured the best of both. When your categories are mostly-but-not-entirely known, model the "mostly" as an enum and the "not entirely" as an optional escape field.

Scenario 6: Generating Structured Product Filters From Search Queries

An e-commerce team wanted to turn a shopper's natural-language query—"cheap waterproof hiking boots under 100 dollars in size 10"—into a set of structured filters their catalog API could apply: category, price ceiling, attributes, and size.

What Worked

The schema modeled each filter as its own typed field: category as an enum, max_price as an optional number, attributes as an array drawn from a known list, and size as an optional string. The decisive choice was making every filter optional. Shoppers mention some attributes and omit others, and forcing the model to populate every filter produced phantom constraints—an inferred color the shopper never asked for, narrowing results wrongly.

With optionality, the model populated only the filters the query actually expressed and left the rest null, which the catalog API correctly treated as "no constraint." The same optionality lesson from the resume scenario reappeared in a completely different domain.

What Stumbled

The first version let the model map free-text attributes directly, so "waterproof" sometimes came back as "water-resistant" or "water proof," none of which matched the catalog's exact attribute values. Constraining attributes to a strict enum of the catalog's real attribute names fixed it: the model now had to choose from valid values rather than paraphrase. This mirrors the support-ticket scenario, where strict enums plus good descriptions delivered consistency that free text could not.

Read together, the scenarios rhyme. Optional fields prevent fabrication when input varies. Field descriptions drive classification consistency. Arithmetic belongs in code, not the model. An uncertainty escape hatch stops confident guessing. Strict enums beat free text when values must match a known set. And hybrid enum-plus-freetext designs handle the mostly-known case.

None of these are domain-specific tricks. They are the same structured-output instincts, surfacing in different clothes. If you internalize the pattern, you can predict the right schema design for a new problem before you write a line of it. The framework generalizes these instincts into a reusable model.

Frequently Asked Questions

Why did marking fields required cause hallucinations in the resume example?

Because a required field tells the model the value must exist, and when the input does not contain it, the model satisfies the requirement by inventing something plausible. Marking such fields optional, with instructions to leave them null when absent, gives the model permission to admit the data is not there rather than fabricating it.

When should I split a generative field into a separate call?

When it sits alongside classification or extraction fields and you notice their accuracy slipping. Generative fields draw on different model behavior than tight structured fields, and combining them can split the model's attention. If the structured fields are what you most need to be reliable, isolate the generative one into its own call.

Should arithmetic ever stay in the model's structured output?

Not for anything that matters. Models produce plausible numbers, not guaranteed-correct ones, so totals and conversions come back intermittently wrong in ways that pass casual review. Extract the raw operands into fields and compute in deterministic code. This is the single most reliable rule across extraction scenarios.

What is the value of an explicit clarification option in function calling?

It gives the model a legitimate path when the user's intent is ambiguous, instead of forcing a confident-but-wrong choice. Without it, the model will pick some function and guess the arguments, producing actions the user did not request. The escape hatch converts silent bad guesses into honest requests for more information.

How do I handle categories that are mostly known but not complete?

Use a hybrid schema: a fixed enum for the known categories plus an optional free-text field for cases that fall outside it. Most inputs land in the enum, keeping your tags consistent and aggregatable, while the escape field ensures genuinely novel categories are captured rather than forced into a wrong bucket.

Key Takeaways

When input completeness varies, optional fields prevent the model from fabricating values to fill required slots.
Precise field descriptions are what make classification consistent; bare enums scatter inconsistently.
Keep arithmetic out of the model—extract raw operands and compute totals in deterministic code.
In function-calling routing, give the model an explicit uncertainty option so it stops guessing confidently.
For mostly-known category sets, combine a fixed enum with an optional free-text escape field.

For the underlying principles behind these decisions, the best practices guide is the reference. Here we stay concrete.

Scenario 1: Parsing Resumes Into Candidate Records

A recruiting team needed to turn uploaded resumes—every one formatted differently—into uniform records with name, years of experience, skills, and most recent role.

What Worked

The lesson: when input completeness varies, required fields force the model to lie. Optionality is honesty.

Scenario 2: Classifying Support Tickets

A support team wanted incoming tickets sorted into one of five categories plus an urgency level, to route them automatically.

What Worked

What Stumbled

Scenario 3: Extracting Line Items From Invoices

A finance workflow needed to read scanned invoices and produce a list of line items with description, quantity, unit price, and a total.

What Worked

Extraction of the raw fields—description, quantity, unit price—was reliable. The schema modeled line items as an array of objects, which the model handled cleanly.

What Stumbled

Scenario 4: Routing With Function Calling

A travel assistant needed to decide, per user message, whether to search flights, search hotels, or ask a clarifying question—and produce the arguments for whichever it chose.

What Worked

The general principle: when the model must choose, give it an escape hatch for uncertainty, or it will fake confidence.

Scenario 5: Normalizing Free-Text Survey Responses

A research team had thousands of open-ended survey answers and wanted each tagged with sentiment and the themes it touched.

What Worked

The Broader Lesson

Scenario 6: Generating Structured Product Filters From Search Queries

What Worked

What Stumbled

Frequently Asked Questions

Why did marking fields required cause hallucinations in the resume example?

When should I split a generative field into a separate call?

Should arithmetic ever stay in the model's structured output?

What is the value of an explicit clarification option in function calling?

How do I handle categories that are mostly known but not complete?

Key Takeaways

When input completeness varies, optional fields prevent the model from fabricating values to fill required slots.
Precise field descriptions are what make classification consistent; bare enums scatter inconsistently.
Keep arithmetic out of the model—extract raw operands and compute totals in deterministic code.
In function-calling routing, give the model an explicit uncertainty option so it stops guessing confidently.
For mostly-known category sets, combine a fixed enum with an optional free-text escape field.

Structured Output in the Wild: Six Scenarios Dissected

Scenario 1: Parsing Resumes Into Candidate Records

What Worked

Scenario 2: Classifying Support Tickets

What Worked

What Stumbled

Scenario 3: Extracting Line Items From Invoices

What Worked

What Stumbled

Scenario 4: Routing With Function Calling

What Worked

Scenario 5: Normalizing Free-Text Survey Responses

What Worked

The Broader Lesson

Scenario 6: Generating Structured Product Filters From Search Queries

What Worked

What Stumbled

What the Scenarios Share

Frequently Asked Questions

Why did marking fields required cause hallucinations in the resume example?

When should I split a generative field into a separate call?

Should arithmetic ever stay in the model's structured output?

What is the value of an explicit clarification option in function calling?

How do I handle categories that are mostly known but not complete?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Structured Output in the Wild: Six Scenarios Dissected

Scenario 1: Parsing Resumes Into Candidate Records

What Worked

Scenario 2: Classifying Support Tickets

What Worked

What Stumbled

Scenario 3: Extracting Line Items From Invoices

What Worked

What Stumbled

Scenario 4: Routing With Function Calling

What Worked

Scenario 5: Normalizing Free-Text Survey Responses

What Worked

The Broader Lesson

Scenario 6: Generating Structured Product Filters From Search Queries

What Worked

What Stumbled

What the Scenarios Share

Frequently Asked Questions

Why did marking fields required cause hallucinations in the resume example?

When should I split a generative field into a separate call?

Should arithmetic ever stay in the model's structured output?

What is the value of an explicit clarification option in function calling?

How do I handle categories that are mostly known but not complete?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?