AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Invoices: The Implied Due DateThe Problem and the FixResumes: Lists Inside DocumentsThe Problem and the FixContracts: Competing DatesThe Problem and the FixProduct Reviews: Sentiment Plus StructureThe Problem and the FixMeeting Transcripts: Finding the SignalThe Problem and the FixReceipts and Expense Reports: Format ChaosThe Problem and the FixSupport Tickets: Classification Plus ExtractionThe Problem and the FixWhat the Scenarios ShareThe Recurring TechniquesFrequently Asked QuestionsWhy not let the model calculate the invoice due date?How do I extract repeating structures like a list of jobs?What is the best way to handle a document with several dates?Can one prompt handle all these document types?Key Takeaways
Home/Blog/Five Extraction Scenarios and Why Each One Worked
General

Five Extraction Scenarios and Why Each One Worked

A

Agency Script Editorial

Editorial Team

·January 31, 2023·7 min read
prompting for data extractionprompting for data extraction examplesprompting for data extraction guideprompt engineering

Principles only become useful when you see them collide with real documents. A rule about handling missing values sounds obvious until you face an invoice where the due date is implied rather than stated, and a rule about disambiguation sounds academic until a contract gives you three dates and asks you to pick one. This article walks through five concrete extraction scenarios, each drawn from a common document type, and shows the specific decision that determined whether the extraction worked.

The point is not to provide copy-paste prompts, because your documents differ from these. The point is to make the reasoning visible: what about each document created risk, what instruction addressed it, and what would have gone wrong without that instruction. Read these the way a chess player reads annotated games, watching for the move that mattered.

Across all five, the same handful of techniques keeps reappearing: schema-first design, explicit missing-value rules, disambiguation for competing candidates, and raw extraction over normalization. Seeing them applied repeatedly is how they move from rules you have read to instincts you can use.

Invoices: The Implied Due Date

An invoice is the canonical extraction target, and the dangerous field is the one that is not always printed.

The Problem and the Fix

Many invoices state an invoice date and payment terms ("net 30") but no explicit due date. A naive prompt either leaves due date null or invents one. The fix was a rule: "If no due date is stated but terms are given, leave due date null; do not calculate it." Calculation belongs in code, not the model. With that rule, the model stopped fabricating dates and the downstream system computed them deterministically.

  • Extracted: invoice number, invoice date, terms, subtotal, tax, total
  • Left to code: due date calculation from terms
  • Result: zero fabricated dates across the sample

Resumes: Lists Inside Documents

Resumes test how a prompt handles repeated structures like jobs and degrees.

The Problem and the Fix

A resume contains multiple work experiences, each with a title, company, and dates. A flat schema cannot capture them; the fix was a nested list with a defined object shape per entry. Specifying that each experience be an object with fixed keys produced consistent arrays instead of a jumble. The broader pattern of designing for repeated structures is covered in A Framework for Prompting for Data Extraction.

Contracts: Competing Dates

Long contracts are where disambiguation rules prove their worth.

The Problem and the Fix

A service contract contained an effective date, an execution date, and a renewal date, and the early prompt returned whichever it found first. The fix was an explicit rule per field: "effective_date is the date the agreement takes effect, stated near the term clause." Naming the meaning rather than the position turned random selection into reliable extraction. The cost of skipping these rules is detailed in 7 Common Mistakes with Prompting for Data Extraction (and How to Avoid Them).

Product Reviews: Sentiment Plus Structure

Reviews mix subjective judgment with extractable facts, and conflating them causes trouble.

The Problem and the Fix

The goal was to extract a star rating, the product name, and a sentiment label. The model occasionally inferred a rating from tone when none was given. The fix separated the two tasks: extract the explicit numeric rating with a null rule, and label sentiment as a distinct field clearly marked as a judgment. Keeping the factual extraction separate from the interpretive label kept the numbers honest.

Meeting Transcripts: Finding the Signal

Transcripts are long, noisy, and full of irrelevant chatter, which stresses a prompt's focus.

The Problem and the Fix

Extracting action items and decisions from a transcript produced false positives, with the model tagging casual remarks as decisions. The fix was a tighter definition: "A decision is a statement where the group commits to a course of action, not a suggestion or a question." Defining the target precisely cut the false positives sharply. The step-by-step process behind building such prompts is laid out in A Step-by-Step Approach to Prompting for Data Extraction.

Receipts and Expense Reports: Format Chaos

Receipts arrive in every imaginable layout, which makes them a stress test for a prompt's robustness to formatting.

The Problem and the Fix

A batch of expense receipts mixed printed thermal slips, photographed handwriting, and emailed confirmations, each placing the total in a different spot with a different label. An early prompt that looked for a field labeled "total" missed receipts that said "amount due" or "balance." The fix was to describe the target by meaning rather than by a single label: "the final amount charged, after tax and tip." Defining the field semantically let the model find it regardless of wording, and a null rule prevented invented totals when a receipt was too faded to read. The variety here underscores why tuning only on clean samples fails, a trap detailed in 7 Common Mistakes with Prompting for Data Extraction (and How to Avoid Them).

Support Tickets: Classification Plus Extraction

Support tickets combine two tasks that are tempting to merge but better kept separate.

The Problem and the Fix

The goal was to extract the customer's account ID and product, and also to classify the ticket's category. Combining classification and extraction in one loose instruction produced inconsistent categories that drifted between runs. The fix was to fix the category to a closed list and extract it as a distinct field: "category must be one of: billing, technical, account, other." Constraining the output to an enumerated set turned a free-form guess into a reliable label, while the account ID and product were extracted with their own null rules. The discipline of separating distinct concerns is the heart of A Framework for Prompting for Data Extraction.

What the Scenarios Share

Across seven document types, the same small toolkit kept deciding the outcome, which is the real lesson worth carrying away.

The Recurring Techniques

The wins came from describing fields by meaning rather than by label or position, constraining outputs to closed sets where possible, separating factual extraction from interpretation, and always providing a rule for absent values. None of these is exotic, and that is the point: reliable extraction is a handful of techniques applied consistently, not a different trick for every document. Seeing them recur across invoices, resumes, contracts, reviews, transcripts, receipts, and tickets is what turns them from rules you have read into instincts you reach for.

Frequently Asked Questions

Why not let the model calculate the invoice due date?

Because calculation from terms is deterministic logic that belongs in code, where it is explicit and testable, rather than in a model that may apply it inconsistently or fabricate a date when terms are ambiguous. Extracting the stated fields and computing the due date in code gives you a correct, auditable result every time, whereas asking the model to compute it introduces a silent source of error.

How do I extract repeating structures like a list of jobs?

Define a nested schema where the repeating element is a list of objects, each with a fixed set of keys. Specify the exact shape of one entry so the model knows to produce a consistent array rather than free-form text. Providing an example with two entries reinforces the pattern. This approach reliably captures resumes, line items, and any document with repeated sub-records.

What is the best way to handle a document with several dates?

Disambiguate by meaning rather than position. For each date field, describe what it represents and where it typically appears, such as the effective date stated near the term clause. Naming the semantic role turns the model's choice from a guess based on order into a reliable extraction. Avoid rules like first date or last date, which break the moment a document is laid out differently.

Can one prompt handle all these document types?

Generally no, because these documents differ fundamentally in structure and the fields they contain. A better approach is a separate, tuned prompt per document type, with a lightweight classification step routing each document to the right prompt. Within a single type, varied samples and clear edge-case rules handle the natural variation, but forcing one prompt across invoices, resumes, and transcripts degrades all of them.

Key Takeaways

  • Leave deterministic calculations like due dates to code, not the model
  • Capture repeating structures with a nested list of fixed-shape objects
  • Disambiguate competing values by meaning, not by position in the document
  • Keep factual extraction separate from interpretive labels like sentiment
  • Define noisy targets precisely so the model does not tag casual remarks as signal
  • Use a separate tuned prompt per document type rather than one prompt for everything

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification