AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Step 1: Build an Honest Data SplitAvoid Leakage NowStep 2: Train a Deliberately Simple BaselineStep 3: Read the Learning Curve to DiagnoseStep 4: The Underfitting Fix SequenceStep 5: The Overfitting Fix SequenceStep 6: Validate With Cross-ValidationRead the Variance Across FoldsStep 6b: Pressure-Test Across SegmentsWhy Aggregates LieStep 7: Run the Final Test Once and StopFrequently Asked QuestionsHow many fixes should I apply before re-checking?What if my diagnosis is ambiguous, with a moderate gap?Can I skip the simple baseline to save time?Why fit preprocessing only on the training set?When do I stop iterating?Key Takeaways
Home/Blog/Diagnose Before You Treat a Broken Model
General

Diagnose Before You Treat a Broken Model

A

Agency Script Editorial

Editorial Team

·May 17, 2025·8 min read
ai model overfitting and underfittingai model overfitting and underfitting how toai model overfitting and underfitting guideai fundamentals

Most people approach overfitting and underfitting as a set of disconnected tips: add dropout, get more data, try regularization. That scattershot habit is why models stay broken. The fixes are not interchangeable. Each one solves exactly one problem and worsens the other. What you need is a sequence that diagnoses before it treats.

This guide is that sequence. Follow it in order, on your own model, today. Each step has a clear input, a clear output, and a decision that routes you to the next step. Do not skip the diagnosis steps to get to the fixes faster. Skipping diagnosis is how people spend a week adding regularization to a model that was underfitting the whole time.

By the end you will have a model whose generalization you understand and a clear record of which knob did what. That record is worth as much as the model.

Step 1: Build an Honest Data Split

Before any modeling, partition your data into three parts: training, validation, and a final test set you will not touch until the very end.

  • Training set: the model learns from this.
  • Validation set: you tune against this.
  • Test set: you look at this exactly once, at the end.

Avoid Leakage Now

Fit every preprocessing transform, scaling, encoding, imputation, on the training set only, then apply it to the others. If you compute a mean across the whole dataset before splitting, you have leaked information and every downstream number is optimistic. This single discipline prevents the most common silent failure.

Step 2: Train a Deliberately Simple Baseline

Start with the simplest reasonable model: a linear or logistic regression, or a shallow tree. Record training error and validation error.

This baseline is not your final model. It is a reference point. Every later change is judged against it. If a complex model does not beat your simple baseline on the validation set, the complexity is buying you nothing.

Step 3: Read the Learning Curve to Diagnose

Now plot training and validation error against training set size, or at minimum compare the two final numbers. The pattern routes everything that follows.

  • Both errors high, gap small: you are underfitting. Go to Step 4.
  • Training error low, validation error high, gap large: you are overfitting. Go to Step 5.
  • Both errors low, gap small: you are in good shape. Go to Step 6.

This is the fork in the road. The whole reason for Steps 1 and 2 was to make this diagnosis trustworthy. For the deeper theory of why this gap maps to bias and variance, see The Complete Guide to Ai Model Overfitting and Underfitting.

Step 4: The Underfitting Fix Sequence

If you are underfitting, apply these in order, re-checking the learning curve after each change. Stop as soon as the validation error reaches an acceptable level.

  1. Add model capacity. Move from linear to a tree ensemble, or add layers and units to a network.
  2. Engineer features. Add interaction terms, polynomial features, or domain signals the model cannot derive on its own.
  3. Reduce regularization. If you set a penalty earlier, lower it.
  4. Train longer. Increase epochs or iterations if the loss was still falling.

Change one thing, re-measure, then decide whether to continue. Changing several at once makes it impossible to know what helped.

Step 5: The Overfitting Fix Sequence

If you are overfitting, work this list in order. Each step trades a little bias for less variance.

  1. Get more training data. The most durable fix. Variance shrinks as examples grow.
  2. Add regularization. L2 for linear models, dropout for networks, max-depth and min-samples limits for trees.
  3. Use early stopping. Halt training when validation error stops improving.
  4. Reduce capacity. Fewer parameters, shallower model, lower polynomial degree.
  5. Augment data. For images or audio, transformations multiply your effective sample size.

Re-check after each. When the gap between training and validation error closes to an acceptable level, stop. The common errors people make in this sequence are catalogued in 7 Common Mistakes with Ai Model Overfitting and Underfitting.

Step 6: Validate With Cross-Validation

A single validation split can be lucky or unlucky. Replace it with k-fold cross-validation, typically five or ten folds, to get a stable estimate and see how much your error varies across folds.

Read the Variance Across Folds

If error swings widely from fold to fold, your model is sensitive to the specific training data, a sign of residual overfitting. Tight, consistent fold scores indicate a model that generalizes. For time-series data, replace random folds with forward-chaining splits so you never train on the future.

Step 6b: Pressure-Test Across Segments

Before you trust an aggregate cross-validation score, slice it. Group your validation predictions by meaningful segments, customer tier, geography, time period, device type, whatever matters for your problem, and compute error within each group.

Why Aggregates Lie

A model can post a respectable overall score while quietly failing an important slice. Suppose it underfits new customers but overfits long-tenured ones; the average looks fine and hides both problems. Per-segment evaluation exposes this unevenness, which an aggregate number cannot.

When you find a struggling segment, decide whether it deserves a targeted fix, more data for that slice, a segment-specific feature, or whether the aggregate model is acceptable for your use case. Either way, you made the call with eyes open rather than discovering the weakness in production.

Step 7: Run the Final Test Once and Stop

Take the configuration that won on cross-validation and evaluate it on the test set you have not touched. This number is your honest estimate of production performance.

Critically, do not now go back and tune to improve this number. The moment you optimize against the test set, it stops being a test set and your estimate becomes fiction. If the test number disappoints, the correct response is to collect more data or rethink the problem, not to keep poking the holdout. The best practices behind this discipline are expanded in Ai Model Overfitting and Underfitting: Best Practices That Actually Work.

Frequently Asked Questions

How many fixes should I apply before re-checking?

Exactly one. The entire value of this sequence comes from isolating cause and effect. If you apply three changes and the model improves, you have learned nothing about which change mattered, and you may be carrying a harmful change masked by two helpful ones.

What if my diagnosis is ambiguous, with a moderate gap?

A moderate gap with moderate error often means you have headroom in both directions. Try adding a little capacity first; if the gap widens sharply, you have hit the overfitting regime and should back off and regularize instead. Treat the ambiguous zone as a place to probe carefully, one step at a time.

Can I skip the simple baseline to save time?

No. Without the baseline you have no reference for whether complexity helps, and you are far more likely to ship an overcomplicated model that overfits. The baseline takes minutes and saves hours. It is the cheapest insurance in the workflow.

Why fit preprocessing only on the training set?

Because fitting on the full dataset lets information from validation and test data influence your transforms, which leaks the answer and inflates your scores. The model appears to generalize better than it will in production, and you discover the gap only after deployment. Fit on training folds, apply to the rest.

When do I stop iterating?

Stop when validation error reaches a level acceptable for your use case and additional changes yield diminishing returns, or when you have exhausted the relevant fix sequence. Chasing marginal gains past the point of diminishing returns often introduces fragility. Ship the simplest model that meets the bar.

Key Takeaways

  • Split into training, validation, and an untouched test set before doing anything else.
  • Train a simple baseline as your reference point.
  • Diagnose with the training-versus-validation gap before choosing any fix.
  • Underfitting and overfitting have opposite fix sequences; apply only the one your diagnosis points to.
  • Change one thing at a time and re-measure.
  • Run the final test set once and resist the urge to tune against it.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification