AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Stage One: Examine the SourceWhat examination producesStage Two: eXpress the ContractContract elementsStage Three: Transform With the Core InstructionKeeping the instruction focusedStage Four: Reconcile Long DocumentsReconciliation tacticsStage Five: Audit the ResultAudit checksStage Six: Control ThroughputThroughput controlsApplying the Model to a Real JobWalking through the contractsKnowing When to Collapse StagesMatching effort to the jobFrequently Asked QuestionsDo I have to use all six stages every time?How is this different from just writing a careful prompt?Where do most teams go wrong with this model?Can the model handle transformations that require judgment, not just extraction?How does this model scale across a team?Key Takeaways
Home/Blog/The EXTRACT Model for Turning Raw Documents Into Clean Output
General

The EXTRACT Model for Turning Raw Documents Into Clean Output

A

Agency Script Editorial

Editorial Team

·May 3, 2021·8 min read
prompting for document transformationprompting for document transformation frameworkprompting for document transformation guideprompt engineering

Most teams approach document transformation as a single act: write a prompt, get a result. That works until the work matters, the volume grows, or someone has to maintain the prompt after the person who wrote it has moved on. At that point you need something more durable than a clever paragraph. You need a model that breaks the work into stages, names them, and tells you when each one applies.

This article introduces the EXTRACT model, a six-stage structure for prompting document transformation reliably. The name is a mnemonic, not a product: Examine, eXpress, Transform, Reconcile, Audit, and Control with Throughput. The value is not the acronym but the discipline of moving through stages in order instead of collapsing them into a single hopeful instruction.

The stages map onto how skilled practitioners already work, whether or not they have named the steps. Making them explicit lets a team share the work, debug it, and improve one stage without disturbing the others.

Stage One: Examine the Source

Before any prompt is written, you study what you actually have. This stage is about understanding, not action.

What examination produces

  • A clear picture of the document's structure: sections, tables, recurring boilerplate.
  • A list of what must be preserved verbatim versus what is free to change.
  • An honest assessment of whether the document fits the context window.

Examination feels skippable because it produces no prompt. Skip it and you will discover the document's quirks at the worst time: in production, in front of a client.

Stage Two: eXpress the Contract

Here you write the output contract before the transformation logic. The contract is the precise shape of the result.

Contract elements

  • The literal output format, whether that is JSON, a structured memo, or a table.
  • Field names, types, and order, written exactly as you want them returned.
  • Rules for missing data, so the model fills gaps predictably instead of inventing.

Expressing the contract first forces a useful question: what does the consumer of this output actually need? A parser needs strictness; a reader needs clarity. The contract encodes that answer. Our pre-flight checklist for reliable document transformation prompts turns this stage into concrete line items.

Stage Three: Transform With the Core Instruction

Only now do you write the instruction that performs the work. By this point the hard thinking is done, so the prompt can be short.

Keeping the instruction focused

  • State the role and goal in one or two sentences.
  • Reference the contract from the previous stage rather than restating it loosely.
  • Add one worked example when the transformation involves judgment.

A focused instruction is easier to maintain. When the task changes, you usually adjust the contract or the example, not the core instruction, which keeps changes localized.

Stage Four: Reconcile Long Documents

When a document exceeds the context window, the transformation has to happen in pieces. Reconciliation is the stage that makes the pieces whole again.

Reconciliation tactics

  • Chunk along natural boundaries such as sections, not arbitrary character counts.
  • Carry a small overlap so context that spans a boundary survives.
  • Define how partial results merge, especially when fields appear in multiple chunks.

Reconciliation is where naive pipelines lose data quietly. The trade-offs, options, and decision guide for document transformation weighs chunking strategies against single-pass approaches in depth.

Stage Five: Audit the Result

Auditing is verification with intent. You are not asking whether the output looks fine; you are checking it against the contract and the source.

Audit checks

  • Parse structured output programmatically rather than reading it.
  • Confirm preserved content matches the source character for character.
  • Hunt specifically for dropped list items and missing final sections.

This stage is where the model earns or loses trust. A transformation that passes its audit consistently is one you can automate. One that does not stays manual no matter how clever the prompt.

Stage Six: Control Throughput

The final stage governs running the transformation repeatedly and unattended.

Throughput controls

  • Set temperature low enough that identical inputs produce identical outputs for extraction tasks.
  • Build a fallback for failed audits: retry, escalate to a human, or quarantine.
  • Log every input and output so any run can be replayed and debugged.

Controlling throughput is what separates a demo from a system. Teams that reach this stage can run thousands of transformations with confidence, which is the point of the whole model.

Applying the Model to a Real Job

Stages are easier to trust when you see them run against an actual document. Consider transforming a batch of service contracts into structured records.

Walking through the contracts

  • Examine reveals that the contracts share a template but vary in their payment-terms section, and that several run past the context window.
  • eXpress produces a contract record schema: parties, effective date, term length, payment terms, and a list of obligations, with nulls allowed for absent fields.
  • Transform writes a short instruction referencing that schema, plus one worked example showing how an obligation clause maps to a list entry.
  • Reconcile handles the long contracts by chunking on section headings with a small overlap, then merging the obligation lists.
  • Audit parses every record, checks parties and dates against the source, and counts obligations to catch any dropped at a chunk boundary.
  • Control sets a low temperature, logs each contract and its output, and routes any record that fails validation to a reviewer.

The point is that no stage is optional once the job is real. Skipping Examine hides the payment-terms variation; skipping Audit lets a dropped obligation reach a client. The pre-flight checklist for document transformation prompts turns each of these stages into concrete steps you can follow.

Knowing When to Collapse Stages

The model's discipline is valuable, but applying all six stages to trivial work is its own kind of waste.

Matching effort to the job

  • A short, one-off summary may need only Examine, eXpress, and Transform. Reconcile and Control add nothing when you run the task once by hand.
  • A repeated extraction at scale needs every stage, because Audit and Control are exactly what make unattended runs safe.
  • An interpretive rewrite leans hard on Transform's example and Audit, while Reconcile may not apply if the document fits one pass.

Reading the job correctly is itself a skill the model supports. By naming the stages, it lets you make a deliberate choice about which to use rather than defaulting to either reckless simplicity or needless complexity. The single-pass or chained decision guide gives a rule for that choice.

Frequently Asked Questions

Do I have to use all six stages every time?

No. For a one-off transformation of a short document, Examine, eXpress, and Transform may be enough. The later stages earn their place as volume and stakes rise. The model's value is knowing which stages a given job requires, not forcing all six onto trivial work.

How is this different from just writing a careful prompt?

A careful prompt collapses every concern into one instruction, which makes it fragile and hard to maintain. The EXTRACT model separates concerns so you can change the output contract without touching the chunking logic, or improve auditing without rewriting the core instruction. Separation is what makes the work survive over time.

Where do most teams go wrong with this model?

They skip Examine because it produces no visible artifact, then discover the document's quirks in production. They also tend to merge eXpress into Transform, which buries the output contract inside the instruction where it is hard to find and change later.

Can the model handle transformations that require judgment, not just extraction?

Yes, and the example in the Transform stage is how you encode judgment. For tasks like deciding which clauses count as obligations, a single worked example teaches the rule more effectively than a paragraph of description. Judgment-heavy tasks lean harder on the Audit stage as well.

How does this model scale across a team?

Because each stage is named and self-contained, different people can own different stages. One person maintains contracts, another owns chunking logic, a third runs audits. The shared vocabulary means a handoff does not require re-explaining the whole pipeline.

Key Takeaways

  • EXTRACT structures document transformation into six ordered stages rather than one prompt.
  • Examine the source and eXpress the output contract before writing any transformation logic.
  • Keep the core Transform instruction short by leaning on the contract and one example.
  • Reconcile long documents with boundary-aware chunking and a defined merge strategy.
  • Audit against the contract and source, then Control throughput with fallbacks and logging.
  • Apply only the stages a given job needs; the value is knowing which ones.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification