"More data fixes overfitting." "If a model fits the training data perfectly, it is broken." "Simpler models are always safer." These statements get repeated until they feel like laws. They are, at best, half-true — and acting on the half that is wrong leads to wasted budget, misdiagnosed models, and shipped failures.
This article takes the most common myths about overfitting and underfitting and replaces each with the accurate picture. The pattern you will notice is that almost every myth is a true heuristic over-generalized into a false rule. The reality is more conditional and more useful.
For the rigorous foundations behind these corrections, The Complete Guide to Ai Model Overfitting and Underfitting is the reference. Here we clear away the misconceptions that get in its way.
Myth: More Data Always Fixes Overfitting
The reality is conditional.
When It Is True
If your learning curve shows validation performance still climbing as training-set size grows, more data genuinely helps. This is the case the myth is built on.
When It Is False
If the curve has flattened — validation performance plateaued long ago — more data does almost nothing. You have a capacity or feature problem, not a data-volume one, and you may be looking at underfitting, not overfitting. Buying more data here is expensive and useless. Always check whether the curve is still climbing before you spend on labeling.
Myth: A Perfect Training Fit Means the Model Is Overfit
This one is outdated for modern models.
The Old Story
Classical intuition says fitting training data perfectly means you memorized it and will generalize poorly. For small models, this is often right.
The Reality
Large, over-parameterized models routinely fit training data perfectly and still generalize well — the double-descent phenomenon. Perfect training fit is no longer automatic proof of overfitting. The only reliable test is measuring performance on held-out data, never inferring it from the training fit alone. The advanced guide covers double descent in depth.
Myth: Simpler Models Are Always Safer
Simplicity trades one failure for another.
The Half-Truth
Simpler models resist overfitting — true. So people reach for the simplest model as a safe default.
The Missing Half
A model too simple for the problem underfits, capping performance below what the task needs. "Safe" from overfitting is not the same as "good." The goal is the right capacity for the problem, found by measuring the generalization gap, not the minimum capacity. Reflexive simplicity manufactures underfitting just as reflexive complexity manufactures overfitting.
Myth: High Accuracy Means the Model Generalizes
High accuracy on what is the whole question.
Where It Goes Wrong
- On training data: high training accuracy with low validation accuracy is the definition of overfitting, not proof of generalization.
- On imbalanced data: 95% accuracy when 95% of cases are one class means the model learned nothing — it predicts the majority and detects nothing.
- On a contaminated benchmark: a high score on a test set that leaked into training measures memorization, not generalization.
The reality: accuracy is meaningful only on a clean, held-out, appropriately-balanced set, with the right metric for the class distribution. The metrics article explains metric selection.
Myth: Regularization Is Free Insurance
Regularization has a cost, and overusing it backfires.
The Reality
Every regularizer trades training fit for generalization. Add too much and you crush both scores — you have regularized your way into underfitting. Regularization is a dial to tune against the generalization gap, not a lever to crank to maximum "for safety." The right amount closes the gap without lowering both scores.
Myth: Cross-Validation Eliminates Overfitting
Cross-validation measures; it does not prevent.
What It Actually Does
K-fold cross-validation gives a more robust estimate of generalization and surfaces variance across folds. It does not stop a model from overfitting. Worse, if you use cross-validation results to tune many hyperparameters, you can overfit to the cross-validation procedure itself. It is a better measurement tool, not a cure, and it can be gamed.
Myth: Foundation Models Made This Obsolete
The opposite is closer to true.
The Reality
Fine-tuning a foundation model on a small dataset overfits fast. Benchmark contamination makes frozen models look better than they generalize. Retrieval and prompting can starve a capable model of signal — the modern face of underfitting. The vocabulary moved; the phenomena did not. If anything, the failure modes arrive faster and hide better in the foundation-model era. The 2026 trends article traces how.
Myth: A Low Validation Loss Means You Are Done
The number can be right and the model still wrong.
Where It Misleads
A strong aggregate validation score can hide a model that fails on a critical subgroup, that is badly calibrated, or that was evaluated on a leaked split. "Validation looks great" is the beginning of due diligence, not the end. The reality: a good aggregate number earns you a closer look at segments, calibration, and split integrity — not a deployment.
Myth: Early Stopping Is Always the Right Cure for Overfitting
A reasonable default, not a universal one.
The Nuance
Early stopping — halting when validation loss starts rising — works well in the classic regime. But in the over-parameterized regime where double descent occurs, the first rise in validation error is not necessarily the best stopping point; performance can improve again past it. And early stopping does nothing for an underfit model, where stopping earlier only makes things worse. The reality: early stopping is one tool matched to one diagnosis, not a reflex to apply to every model.
The Pattern Behind the Myths
Nearly every myth is a useful heuristic that someone hardened into a universal rule. More data often helps. Simpler often resists overfitting. The error is dropping the "often." The reality is always conditional on what your measurements show — the learning curve, the gap, the per-segment numbers. Measure the specific model in front of you instead of applying a slogan, and the myths stop costing you.
Frequently Asked Questions
Does more data ever make overfitting worse?
No, more clean data does not make overfitting worse — but it often does nothing if your learning curve has already flattened. The risk is not harm; it is wasted spend on data that cannot help, when the real problem is capacity or features.
Is a model that scores 100% on training always bad?
Not for large over-parameterized models, which can fit training data perfectly and still generalize well thanks to double descent. The only valid test is held-out performance. Never infer overfitting from training fit alone for modern models.
Why isn't the simplest model always the safest choice?
Because a model too simple for the task underfits and caps performance below what the problem needs. Resisting overfitting is not the same as being good. Aim for the right capacity, found by measuring the gap, not the minimum.
Can cross-validation cause overfitting?
Indirectly, yes. Cross-validation estimates generalization but does not prevent overfitting, and tuning many hyperparameters against cross-validation results can overfit to the cross-validation procedure itself. Treat it as measurement, not as a cure.
Did foundation models make overfitting irrelevant?
No. Small-data fine-tuning overfits quickly, benchmark contamination inflates frozen-model scores, and weak retrieval starves capable models. The failure modes persist and often arrive faster — they simply wear new names.
Key Takeaways
- Most myths are true heuristics over-generalized into false rules; the reality is always conditional on your measurements.
- More data helps only while the learning curve is still climbing; a flat curve points to capacity or feature problems.
- Perfect training fit no longer proves overfitting for large models — measure held-out performance.
- Simpler is not automatically safer; too simple underfits, and over-regularizing manufactures underfitting.
- Cross-validation measures generalization rather than preventing it, and foundation models changed the vocabulary, not the phenomena.