AGENCYSCRIPT
CoursesEnterpriseBlog
👑FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Mistake 1: Treating the Problem as a Modeling Problem Before It's a Data ProblemWhy it happensWhat it costsThe corrective practiceMistake 2: Evaluating Models with the Wrong MetricsWhy it happensWhat it costsThe corrective practiceMistake 3: Data LeakageWhy it happensWhat it costsThe corrective practiceMistake 4: Overfitting and Underfitting Without Diagnosing Which Problem You HaveWhy it happensWhat it costsThe corrective practiceMistake 5: Skipping Baseline ModelsWhy it happensWhat it costsThe corrective practiceMistake 6: Ignoring Deployment and Inference Requirements During Model DesignWhy it happensWhat it costsThe corrective practiceMistake 7: Treating Model Deployment as a Finish LineWhy it happensWhat it costsThe corrective practiceFrequently Asked QuestionsWhat is the most common mistake beginners make with machine learning basics?How do you know if your model is overfitting or underfitting?Why do good evaluation metrics matter so much in machine learning basics?What is data leakage and how can you prevent it?How often should a deployed model be retrained?Key Takeaways
Home/Blog/Seven Walls Every ML Beginner Hits in the Same Order
General

Seven Walls Every ML Beginner Hits in the Same Order

A

Agency Script Editorial

Editorial Team

·March 27, 2026·10 min read
machine learning basicsmachine learning basics common mistakesmachine learning basics guideai fundamentals

Machine learning is not complicated because the math is hard. It's complicated because the easy-sounding steps — gather data, train a model, evaluate it, deploy it — each contain a dozen quiet ways to fail. Most practitioners hit the same seven walls in the same order, waste weeks diagnosing symptoms instead of causes, and then rebuild from a position that was always available to them. This article names those walls directly.

The failure modes below aren't theoretical. They appear in agency projects, internal tooling, client deliverables, and enterprise pilots alike. Some cost a sprint. Others corrupt an entire product line before anyone notices. Understanding why each happens, what it actually costs, and what the corrective practice looks like is the fastest way to compress your learning curve and protect your organization's credibility.

If you're building fluency from the ground up, pair this with A Framework for Machine Learning Basics to see how these mistakes map onto a structured process. If you're running projects for clients, The Machine Learning Basics Checklist for 2026 gives you a concrete audit format to catch these problems before they ship.


Mistake 1: Treating the Problem as a Modeling Problem Before It's a Data Problem

Most practitioners spend their first month learning algorithms. They study linear regression, decision trees, gradient boosting, and neural networks — and then they try to select the right one before they have adequately examined what they're feeding it. This is backwards.

Why it happens

Algorithms are visible, teachable, and intellectually satisfying. Data quality work is tedious and unglamorous. Training documentation, inference pipelines, and feature engineering feel like infrastructure — important but boring. So people skip to the part that feels like progress.

What it costs

A model trained on dirty, incomplete, or systematically biased data will learn the wrong patterns with high confidence. You'll see reasonable training metrics, deploy to production, and then watch performance degrade or produce outputs that make no business sense. Diagnosing this after deployment is ten times harder than catching it before training.

The corrective practice

Before selecting any model architecture, audit your data for: missing values and their distribution (random missing vs. structurally missing are different problems), label quality, class imbalance, and data leakage between train and test splits. A reasonable rule of thumb: spend at least as much time on data inspection and preparation as you spend on model selection and tuning. Often more.


Mistake 2: Evaluating Models with the Wrong Metrics

Accuracy is the default metric everyone reaches for. It is also the metric most likely to mislead you, particularly when your classes are imbalanced — which they almost always are in real-world problems.

Why it happens

Accuracy is intuitive. If 95% of predictions are correct, the model must be good. But if 95% of your data belongs to one class, a model that predicts that class every single time achieves 95% accuracy while being completely useless.

What it costs

You ship a fraud detection model, a churn predictor, or a medical triage classifier that looks excellent on paper and fails on the cases that actually matter. The error isn't subtle — it's catastrophic. But because the headline metric looked fine, it often takes weeks or quarters to surface.

The corrective practice

Choose metrics that match the business problem. For imbalanced classification: precision, recall, F1, and AUC-ROC. For regression: MAE vs. RMSE behave differently depending on how much you want to penalize large errors. For ranking problems: metrics like NDCG or mean average precision. Define your success metric before you train anything. Write it into the project brief and treat metric changes mid-project as scope changes requiring deliberate decisions.


Mistake 3: Data Leakage

Data leakage is when information from the future (or from the test set) bleeds into training. The model appears to perform brilliantly during evaluation and catastrophically in production.

Why it happens

Leakage has subtle triggers. Normalizing or scaling the entire dataset before splitting it into train/test. Including a feature that is only available after the outcome occurs. Using a timestamp-based target without time-aware splitting. Each of these is easy to do accidentally, especially if you're working quickly or adapting someone else's code.

What it costs

Your evaluation metrics are simply lies. A model showing 98% accuracy due to leakage might perform at 60% in production — and you won't know until it's live. The wasted compute is one cost. The wasted trust — from clients, stakeholders, or internal teams — is harder to recover.

The corrective practice

Always split your data before any preprocessing. Apply scalers, encoders, and imputers by fitting only on the training fold, then transforming both train and test. For time-series data, use a strict temporal split where no future data can appear in any training fold. Review every feature and ask: "Would this value be available at prediction time?" If not, remove it.


Mistake 4: Overfitting and Underfitting Without Diagnosing Which Problem You Have

Overfitting (the model memorizes training data, generalizes poorly) and underfitting (the model is too simple to capture real patterns) are taught as opposites. In practice, many practitioners can identify them in a textbook but misdiagnose them in their own work, then apply the wrong fix.

Why it happens

Both problems can produce similar-looking validation curves early in training. Practitioners also tend to over-attribute poor performance to overfitting because adding regularization or more data feels like a productive response, even when the real issue is a model that's too constrained to learn the pattern.

What it costs

Applying L2 regularization or dropout to an underfitting model makes it worse. Collecting more data to fix an overfitting problem helps but is expensive. Misdiagnosis burns time and resources on the wrong intervention.

The corrective practice

Plot your training loss and validation loss against training steps or epochs. Overfitting looks like a widening gap — training loss decreases while validation loss stagnates or rises. Underfitting looks like both losses remaining high and flat. Diagnosis first; then choose interventions. For overfitting: regularization, dropout, early stopping, more training data. For underfitting: more model capacity, better feature engineering, longer training. See Machine Learning Basics: Best Practices That Actually Work for a structured diagnostic approach.


Mistake 5: Skipping Baseline Models

The instinct when starting an ML project is to reach for the most powerful tool available — XGBoost, a fine-tuned transformer, a deep neural network. The problem is that without a baseline, you have no frame of reference for whether complexity is earning its keep.

Why it happens

Complex models feel like serious work. A logistic regression or a simple decision tree feels like giving up. This is a cultural bias, not a technical one.

What it costs

You spend significant compute and engineering time on a sophisticated model that beats a simple baseline by 0.3% — which may not justify the complexity, maintenance overhead, or inference latency. Or, worse, you assume your model is performing well because it looks impressive on a dashboard, when a naive heuristic would have done as well.

The corrective practice

Always define and implement a baseline before training anything complex. A baseline can be as simple as "predict the most common class," a single-feature linear model, or a rule-based system already in use. Your sophisticated model needs to beat the baseline by a margin that justifies its cost. If it doesn't, consider whether you have a model selection problem or a feature problem.


Mistake 6: Ignoring Deployment and Inference Requirements During Model Design

The gap between "works in a notebook" and "runs in production" is where many ML projects die quietly. Teams build models without asking how they'll be served, at what latency, with what infrastructure, and with what freshness requirements.

Why it happens

Research and experimentation happen in isolated environments — Jupyter notebooks, local machines, managed cloud notebooks. These environments abstract away serving constraints. It's easy to train a model that requires 32GB of RAM for inference and only discover this when someone asks why the API is timing out.

What it costs

Model redesign after the fact is expensive. So is forcing an infrastructure team to provision specialized hardware to run an unnecessarily large model. Latency failures in production can kill user trust and, in real-time applications, entire product features.

The corrective practice

Define inference requirements before training. Ask: what is the acceptable latency for a single prediction? What is the expected request volume? Does this model need to run on edge hardware, a mobile device, or a shared API endpoint? These constraints should inform your model architecture choices from the start. Quantization, distillation, and ONNX export are all easier to plan for early than to retrofit later. For real-world illustrations of how deployment constraints shape architecture choices, Machine Learning Basics: Real-World Examples and Use Cases covers several applied scenarios.


Mistake 7: Treating Model Deployment as a Finish Line

The final and most expensive mistake is the assumption that once a model is deployed, the work is done. Models degrade. Data distributions shift. User behavior changes. What worked at launch may actively fail six months later.

Why it happens

ML project timelines are typically structured around delivery: build, test, deploy. There's rarely a phase called "maintain and monitor" with dedicated ownership. Once the handoff happens, monitoring often falls through the gaps between teams.

What it costs

Model drift in a customer-facing system produces degraded recommendations, miscategorized content, incorrect predictions — often silently. By the time it surfaces as a user complaint or a business metric decline, the root cause is weeks or months in the past. Rebuilding trust after a visible model failure is costly.

The corrective practice

Build monitoring into the deployment plan from day one. At minimum, track: prediction distribution over time, feature distribution over time, and ground-truth label comparison when feedback is available. Set explicit thresholds that trigger retraining or review. Assign clear ownership. A Case Study: Machine Learning Basics in Practice illustrates what a functional post-deployment monitoring workflow looks like in an agency context.


Frequently Asked Questions

What is the most common mistake beginners make with machine learning basics?

The most common mistake is prioritizing model selection and tuning before adequately examining data quality. Beginners learn algorithms first because they're visible and teachable, but the largest performance gains almost always come from better data, better feature engineering, and a clearer problem definition — not from switching to a fancier model.

How do you know if your model is overfitting or underfitting?

Plot your training loss and validation loss together across training steps. If the training loss drops but the validation loss stays flat or rises, that's overfitting. If both losses are high and the model isn't learning well on either split, that's underfitting. You need to see both curves to make a confident diagnosis.

Why do good evaluation metrics matter so much in machine learning basics?

The metric you optimize becomes the model's definition of "success." If that definition doesn't match your actual business objective — for example, using accuracy on an imbalanced dataset — the model will learn to satisfy the metric, not the goal. Choosing the right metric before training is one of the highest-leverage decisions in any ML project.

What is data leakage and how can you prevent it?

Data leakage occurs when your model sees information during training that it wouldn't have access to at prediction time — for example, a feature computed from the outcome, or preprocessing fitted on the full dataset before splitting. Prevent it by splitting your data before any preprocessing, fitting transformations only on training data, and auditing every feature for temporal or logical contamination.

How often should a deployed model be retrained?

There's no universal answer — it depends on how quickly the data distribution changes in your domain. Some models can remain stable for months; others need weekly or even daily retraining. The practical answer is: set up monitoring that measures prediction distribution and model performance over time, and let the data tell you when drift has exceeded an acceptable threshold.


Key Takeaways

  • Data quality problems account for more ML failures than model selection problems. Start there.
  • Choose evaluation metrics before training. Match them to business outcomes, not defaults.
  • Data leakage produces falsely optimistic metrics. Split before preprocessing; audit every feature.
  • Diagnose overfitting vs. underfitting explicitly using loss curves before applying any fix.
  • Always implement a simple baseline. Complex models need to justify their cost with measurable gains.
  • Deployment constraints — latency, memory, infrastructure — must inform architecture from the start.
  • A deployed model is not a finished model. Monitoring, drift detection, and retraining ownership need explicit plans before launch.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification