Understanding Bias and Variance in Machine Learning: A Complete Guide
Machine Learning models are like students — some memorize examples without truly learning the concept, while others grasp the general idea but miss the details. This tug-of-war between memorization and generalization lies at the heart of one of the most fundamental concepts in ML: Bias and Variance.
In this post, we’ll break down:
- What bias and variance mean.
- How they affect train and test errors.
- The bias–variance tradeoff.
- Strategies to mitigate each scenario.
🧠 What Are Bias and Variance?
Understanding bias and variance is key to diagnosing and improving machine learning models. Here’s a breakdown:
🎯 Bias: Error from Wrong Assumptions
- Definition: Bias is the error introduced by approximating a real-world problem with a simplified model. It refers to the error introduced by simplifying the real-world problem too much.
- High bias means the model is too simple to capture the underlying patterns — it underfits the data.
- Low bias means the model is flexible enough to learn the true relationships.
Example:
Get Mahabir Mohapatra’s stories in your inbox
Join Medium for free to get updates from this writer.
Subscribe
A linear model trying to fit a complex nonlinear pattern will have high bias — it misses the mark consistently.
🔄 Variance: Error from Sensitivity to Data
- Definition: Variance is the error introduced by the model’s sensitivity to small fluctuations in the training data. It measures how sensitive a model is to the specific data points it was trained on.
- High variance means the model learns noise as if it were signal — it overfits the data.
- Low variance means the model generalizes well to new data.
Example:
A deep neural network with no regularization might perform perfectly on training data but poorly on test data — classic high variance.
📊 Interpreting Train vs. Test Error in terms of Bias and Variance
High-bias models produce high training error and high test error, because they fail to fit both training and unseen data where as High-variance models have low training error but high test error.
🔍 Bias and Train Error
- Bias is the error due to overly simplistic assumptions in the model.
- If your train error is high, the model isn’t fitting the training data well → high bias.
- If your train error is low, the model is capturing the training data patterns → low bias.
🔄 Variance and Test Error
- Variance is the error due to the model being too sensitive to the training data.
- If your test error is much higher than train error, the model is overfitting → high variance.
- If your test error is close to train error, the model generalizes well → low variance.
Here’s a simple mental model:
Press enter or click to view image in full size

⚖️ The Bias-Variance Tradeoff
The goal is to find a balance:
- Too simple → high bias, low variance
- Too complex → low bias, high variance
👉 The sweet spot is a model that captures the true signal without overfitting noise, in simple terms where test error is minimal.
🧠 How to Mitigate Bias and Variance Problems
Let’s look at strategies to handle each scenario.
1️⃣ High Bias (Underfitting)
Symptoms:
- High training error.
- High test error.
- Model fails to capture patterns.
Fixes:
- Increase model complexity (e.g., use polynomial features or a deeper neural network).
- Reduce regularization (lower L1/L2 penalty).
- Add more relevant features.
- Train longer (if undertrained).
Example:
If your linear regression model performs poorly on both training and test data, try switching to a polynomial regression or a tree-based model.
2️⃣ High Variance (Overfitting)
Symptoms:
- Low training error.
- High test error.
- Model fits noise rather than signal.
Fixes:
- Simplify the model (reduce depth or layers).
- Add regularization (L1, L2, dropout).
- Collect more training data.
- Use cross-validation to tune hyperparameters.
- Use techniques like bagging (e.g., Random Forests) or dropout (in neural networks).
Example:
If your deep neural network achieves 99% training accuracy but 70% test accuracy, you may need dropout layers or early stopping.
3️⃣ Balanced Bias and Variance
When both bias and variance are under control:
- Training and test errors are both low and close.
- Model generalizes well.
- Hyperparameters are well-tuned.
To reach this zone:
- Use cross-validation to monitor generalization performance.
- Apply regularization gradually rather than aggressively.
- Keep a validation set separate from your training data.
🚀 Takeaway
Understanding bias and variance is key to becoming a better ML practitioner.
They explain why your model behaves the way it does and how to improve it.
