AGENCYSCRIPT
CoursesEnterpriseBlog
👑FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Use Case 1: Support Ticket TriageWhy It WorkedUse Case 2: On-Device TranslationWhy It Worked, With a CaveatUse Case 3: Search Result RankingWhy It WorkedUse Case 4: Content Moderation at ScaleWhere It Got TrickyThe FixUse Case 5: A Case Where Distillation Was the Wrong CallWhy It Did Not Pay OffUse Case 6: Structured Extraction From DocumentsWhy It WorkedWhat the Examples Have in CommonFrequently Asked QuestionsWhat kind of task distills best?When is distillation a mismatch?Why did the moderation example almost fail?Is on-device distillation different from cost-driven distillation?Can I combine distillation with a fallback to the teacher?Key Takeaways
Home/Blog/Where Distillation Earned Its Keep, and Where It Flopped
General

Where Distillation Earned Its Keep, and Where It Flopped

A

Agency Script Editorial

Editorial Team

·March 15, 2025·7 min read
what is model distillationwhat is model distillation exampleswhat is model distillation guideai fundamentals

Abstract explanations of distillation only go so far. What makes the concept click is seeing it applied to specific problems, with the detail of why it worked in one case and failed in another. This article walks through concrete use cases across different domains. For each, we describe the setup, what made distillation a fit or a mismatch, and the lesson.

A note on what follows: these are representative scenarios drawn from common patterns, illustrating typical trade-offs rather than any single company's confidential results. The point is the shape of the decision, not a leaderboard. If you want the underlying mechanics first, start with the complete guide.

Use Case 1: Support Ticket Triage

A company routes incoming support tickets to the right team using a large language model. The model is accurate but expensive at high volume, and triage runs on every single ticket.

Why It Worked

The task is narrow and well-defined: classify a ticket into one of a fixed set of queues. There were years of historical tickets with known correct routing, so the production distribution was easy to match. The teacher's outputs were checkable against actual routing outcomes, so filtering was straightforward. A small student preserved nearly all the routing accuracy at a fraction of the per-ticket cost, which mattered because the volume was enormous.

The lesson: high volume plus a narrow, checkable task is the ideal distillation profile.

Use Case 2: On-Device Translation

A mobile app needs to translate text offline, with no network call. The strongest translation models are far too large to ship on a phone.

Why It Worked, With a Caveat

Here distillation was not an optimization — it was the only option, because the teacher physically cannot run on the device. The student was distilled from a large translation model down to something that fits in a phone's memory budget. Quality dropped noticeably compared to the teacher, but the alternative was no offline translation at all, so the trade was easy.

The caveat: the team had to accept a real quality gap. On-device distillation often means compressing harder than you would like, and you live with the result. The framework article covers how to reason about that compression ceiling.

Use Case 3: Search Result Ranking

A search system uses a heavy model to re-rank candidate results for relevance. Re-ranking is latency-critical — it sits directly in the user's path — and a slow model degrades the whole experience.

Why It Worked

The driver here was latency, not just cost. The large re-ranker produced excellent rankings but was too slow to run inline. A distilled student ran fast enough to stay in the request path while preserving most of the ranking quality. Because ranking quality could be measured against click and relevance signals, the team could evaluate the student precisely.

The lesson: distillation is as much a latency tool as a cost tool. When a great model is too slow to use inline, a fast student can make it deployable.

There is a subtlety worth noting in ranking distillation specifically. The team did not need the student to reproduce the teacher's exact relevance scores — only its ordering of results. That relaxation made the distillation easier than a pure regression task, because the student could be slightly off on absolute scores as long as it ranked the right documents above the wrong ones. Recognizing what your task actually requires — exact outputs versus correct ordering versus correct top choice — lets you set a looser, more achievable target and often a smaller student.

Use Case 4: Content Moderation at Scale

A platform moderates user content with a large model. Volume is massive and the cost per item is small but adds up across billions of items.

Where It Got Tricky

This one is instructive because it nearly failed. The aggregate accuracy of the student looked excellent, but evaluation by slice revealed it was much weaker on rare, high-severity categories — exactly the ones that matter most. The training distribution had too few examples of the dangerous edge cases.

The Fix

The team oversampled the rare high-severity categories in the training set and added a teacher fallback for low-confidence moderation decisions. After that, the student held up on the slices that mattered. This is a textbook case of why you evaluate by slice rather than average.

Use Case 5: A Case Where Distillation Was the Wrong Call

Not every example is a success. A small team wanted to distill a large model for an internal research-assistant tool used a few dozen times a day by a handful of analysts.

Why It Did Not Pay Off

The teacher's total cost at that volume was already trivial. The task was also broad — open-ended research questions — which meant the student would need to be large to preserve quality, eroding the savings. The engineering effort to build and maintain a distillation pipeline dwarfed any cost it could recover.

The right answer was to do nothing: keep calling the teacher directly. The lesson is that distillation is justified by scale and narrowness. Without both, simpler options win. Our best practices article makes the case for always running a "do nothing" baseline first.

Use Case 6: Structured Extraction From Documents

A company extracts structured fields — dates, amounts, party names — from semi-structured documents. A large model handled the messiness well but cost too much to run on every document at their throughput.

Why It Worked

Structured extraction is a near-ideal distillation target. The output is a defined schema, so correctness is mostly checkable: a date is right or wrong, an amount matches or does not. That made teacher filtering almost fully automatable. The team verified extracted fields against known-good records, dropped the teacher's errors, and trained the student on clean, schema-conformant outputs. Because the task was so well-bounded, a small student preserved high field-level accuracy.

The lesson: tasks with a checkable, structured output are the easiest to distill well, because the same structure that defines correctness also automates your filtering and evaluation.

What the Examples Have in Common

Pull these together and a pattern emerges. The wins shared three traits: high volume or strict latency needs, a narrow and well-defined task, and a way to measure and filter teacher quality. The struggles came from broad tasks, low volume, or distributions that missed the critical edge cases. When you are evaluating your own problem, score it against those traits before committing.

Notice too that several of the wins relaxed what "match the teacher" meant. The ranking case needed only correct ordering; the extraction case needed only schema-conformant fields; the triage case needed only the right queue. None of them required the student to reproduce the teacher's full output verbatim. Identifying the minimal thing your task actually needs is a recurring lever — it lowers the bar the student must clear and, with it, the size and cost of the student you can get away with.

Frequently Asked Questions

What kind of task distills best?

A narrow, well-defined task with high request volume and a measurable output. Classification, routing, ranking, and structured extraction distill cleanly because you can match the distribution and check the teacher's outputs.

When is distillation a mismatch?

Low-volume tasks where the teacher is already cheap, broad open-ended tasks that need the teacher's full capability, and any case where you cannot build a training set that matches production. In those situations, simpler alternatives usually win.

Why did the moderation example almost fail?

The training distribution underrepresented rare, high-severity categories, so the student was weak exactly where it mattered most while looking strong in aggregate. Oversampling the rare cases and adding a teacher fallback fixed it.

Is on-device distillation different from cost-driven distillation?

In motivation, yes. On-device distillation is often forced — the teacher cannot run on the hardware at all — so you accept a larger quality gap than you would for a pure cost optimization. The technique is the same; the tolerance for quality loss is different.

Can I combine distillation with a fallback to the teacher?

Yes, and it is a strong pattern. Serve the cheap student for confident cases and route low-confidence or high-stakes inputs back to the teacher. The content moderation example used exactly this hybrid.

Key Takeaways

  • The best distillation use cases are narrow, high-volume, and measurable — support triage, ranking, structured classification.
  • On-device deployment often forces distillation because the teacher cannot fit the hardware, at the cost of a larger quality gap.
  • Latency-critical inline tasks like search re-ranking benefit as much from the student's speed as from its lower cost.
  • Slice-based evaluation catches failures on rare but critical categories that aggregate accuracy hides.
  • Low-volume or broad tasks are often better served by doing nothing and calling the teacher directly.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification