Your model achieves 99 percent accuracy on training data. You’re celebrating the amazing results until you test it on new data and accuracy drops to 60 percent. What just happened? You’ve encountered overfitting, one of the most common and frustrating problems in machine learning.
Understanding the balance between overfitting and underfitting is critical for building models that actually work in the real world. Your model needs to learn genuine patterns from training data without memorizing noise or missing important relationships. Finding this balance separates models that work in development from those that succeed in production.
Overfitting vs underfitting explained simply means your model either learned too much or too little from the training data. Overfitting happens when your model memorizes training examples instead of learning generalizable patterns. Underfitting happens when your model is too simple to capture the real relationships in your data. Let me show you exactly what both look like and how to fix them.
What overfitting actually looks like
Overfitting means your model performs excellently on training data but poorly on new data. It learned the training data too well, including random noise and peculiarities that don’t represent true patterns.
Imagine studying for a test by memorizing all practice problems word for word. You’d ace those exact problems but struggle with any variation. That’s overfitting. Your model memorized specific training examples rather than learning underlying rules.
A visual example makes this concrete. Suppose you’re predicting house prices from square footage. Your training data has 20 houses. An overfit model might create a complex curve that passes through every single training point perfectly.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# Generate sample data with noise
np.random.seed(42)
X = np.linspace(0, 10, 20).reshape(-1, 1)
y_true = 2 * X.ravel() + 1
y = y_true + np.random.normal(0, 1.5, X.shape[0])
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Overfit model (degree 15 polynomial)
poly_overfit = PolynomialFeatures(degree=15)
X_train_poly = poly_overfit.fit_transform(X_train)
X_test_poly = poly_overfit.transform(X_test)
overfit_model = LinearRegression()
overfit_model.fit(X_train_poly, y_train)
# Evaluate
train_score = overfit_model.score(X_train_poly, y_train)
test_score = overfit_model.score(X_test_poly, y_test)
print(f"Overfit model training R2: {train_score:.4f}")
print(f"Overfit model test R2: {test_score:.4f}")
The overfit model achieves near perfect training accuracy but terrible test accuracy. The complex curve fits training noise that doesn’t appear in test data. Between training points, the curve makes wild swings that don’t reflect reality.
Signs of overfitting include a large gap between training and validation performance. If training accuracy is 95 percent but validation accuracy is 70 percent, you’re overfitting. The model learned patterns specific to the training set that don’t generalize.
Complex models are more prone to overfitting. Deep neural networks with millions of parameters can memorize entire datasets. Decision trees that grow very deep create specific rules for tiny groups of examples. More model capacity means more opportunity to overfit.
Small training datasets increase overfitting risk. With only 50 examples, your model might learn those specific 50 cases perfectly without discovering general patterns. More training data makes it harder to memorize everything and easier to find real patterns.
What underfitting looks like in practice
Underfitting is the opposite problem. Your model is too simple to capture the true relationships in your data. It performs poorly on both training and test data because it never learned the patterns.
Using the same house price example, an underfit model might fit a horizontal line through the data. The line completely ignores the relationship between square footage and price. It just predicts the average price for every house.
# Underfit model (constant prediction)
underfit_model = LinearRegression()
underfit_model.fit(np.ones_like(X_train), y_train)
# This predicts the mean for all examples
train_pred = np.full_like(y_train, y_train.mean())
test_pred = np.full_like(y_test, y_train.mean())
from sklearn.metrics import r2_score
train_score_under = r2_score(y_train, train_pred)
test_score_under = r2_score(y_test, test_pred)
print(f"Underfit model training R2: {train_score_under:.4f}")
print(f"Underfit model test R2: {test_score_under:.4f}")
The underfit model performs poorly on both training and test data. It’s too simple to represent the actual relationship. Poor performance across all data indicates underfitting.
Underfitting happens when your model lacks capacity to learn the patterns. A linear model can’t capture non-linear relationships. A shallow decision tree can’t represent complex decision boundaries. A neural network with one neuron can’t learn much of anything.
Insufficient training also causes underfitting. If you stop training a neural network after 5 iterations when it needs 100, it hasn’t learned enough. The model has capacity but didn’t use it.
Poor features lead to underfitting too. If your features don’t contain information relevant to the prediction, even a powerful model can’t learn useful patterns. Predicting house prices from the owner’s favorite color won’t work no matter how sophisticated your model.
Finding the right balance
The goal is a model that generalizes well. It should perform reasonably well on training data and similarly on test data. A small gap between training and test performance indicates good generalization.
A properly fit model for our house price example might use a simple linear regression or a low-degree polynomial. It captures the general upward trend without fitting every noise fluctuation.
# Good fit model (simple linear regression)
good_model = LinearRegression()
good_model.fit(X_train, y_train)
train_score_good = good_model.score(X_train, y_train)
test_score_good = good_model.score(X_test, y_test)
print(f"Good model training R2: {train_score_good:.4f}")
print(f"Good model test R2: {test_score_good:.4f}")
The properly fit model has similar performance on training and test data. Both scores are reasonable, not perfect on training and terrible on test. This indicates the model learned generalizable patterns.
Finding the balance requires monitoring both training and validation performance during development. Plot both as you increase model complexity. Training performance improves continuously. Validation performance improves then degrades as you overfit.
The optimal model complexity is where validation performance peaks. Before that point you’re underfitting. After that point you’re overfitting. This is called the bias variance tradeoff.
High bias models underfit because they make strong assumptions that don’t match reality. High variance models overfit because they’re too sensitive to training data fluctuations. You want low bias and low variance, but there’s usually a tradeoff.
Techniques to prevent overfitting
Several strategies help prevent overfitting once you recognize it. Collecting more training data is the most effective solution. More examples make it harder to memorize everything and easier to find real patterns.
Regularization adds penalties for model complexity. L1 and L2 regularization penalize large coefficient values in linear models. Dropout randomly disables neurons during neural network training. These techniques force models to learn simpler patterns.
from sklearn.linear_model import Ridge, Lasso
# Ridge regression (L2 regularization)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train_poly, y_train)
# Lasso regression (L1 regularization)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train_poly, y_train)
print(f"Ridge test R2: {ridge.score(X_test_poly, y_test):.4f}")
print(f"Lasso test R2: {lasso.score(X_test_poly, y_test):.4f}")
Early stopping prevents overfitting in iterative training. Monitor validation performance during training and stop when it starts degrading. The model at peak validation performance generalizes best.
Reducing model complexity helps too. Use fewer features through feature selection. Limit tree depth in decision trees. Use fewer layers or neurons in neural networks. Simpler models can’t overfit as severely.
Cross-validation provides better performance estimates and helps detect overfitting. Train on multiple different data splits. If performance varies wildly across folds, you’re likely overfitting to specific data characteristics.
Data augmentation creates artificial training examples by applying transformations to existing data. For images, you might rotate, flip, or crop. This increases effective dataset size without collecting new data.
Techniques to fix underfitting
Underfitting requires opposite solutions. Increase model complexity so it has capacity to learn patterns. Use more features, deeper trees, or larger neural networks.
Add relevant features through feature engineering. Create polynomial features for non-linear relationships. Extract date components from timestamps. Combine existing features in meaningful ways.
# Add polynomial features to fix underfitting
poly_features = PolynomialFeatures(degree=2)
X_train_poly2 = poly_features.fit_transform(X_train)
X_test_poly2 = poly_features.transform(X_test)
better_model = LinearRegression()
better_model.fit(X_train_poly2, y_train)
print(f"Better model test R2: {better_model.score(X_test_poly2, y_test):.4f}")
Train longer if using iterative algorithms. Neural networks need enough epochs to converge. Gradient boosting needs enough trees to learn all patterns. Don’t stop training prematurely.
Reduce regularization strength if it’s too aggressive. Heavy regularization prevents overfitting but can cause underfitting. Try lower penalty values to give the model more flexibility.
Check your data quality. Missing values, incorrect labels, or irrelevant features all hurt learning. Clean data and good features are prerequisites for any model to work.
Real world example with classification
Let’s see overfitting vs underfitting with a classification problem. We’ll predict whether customers buy a product based on age and income.
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Create classification data
X_class, y_class = make_classification(
n_samples=200,
n_features=2,
n_informative=2,
n_redundant=0,
random_state=42
)
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
X_class, y_class, test_size=0.3, random_state=42
)
# Underfit tree (max_depth=1)
underfit_tree = DecisionTreeClassifier(max_depth=1, random_state=42)
underfit_tree.fit(X_train_c, y_train_c)
# Good fit tree (max_depth=5)
good_tree = DecisionTreeClassifier(max_depth=5, random_state=42)
good_tree.fit(X_train_c, y_train_c)
# Overfit tree (no depth limit)
overfit_tree = DecisionTreeClassifier(random_state=42)
overfit_tree.fit(X_train_c, y_train_c)
# Compare results
for name, model in [('Underfit', underfit_tree),
('Good', good_tree),
('Overfit', overfit_tree)]:
train_acc = accuracy_score(y_train_c, model.predict(X_train_c))
test_acc = accuracy_score(y_test_c, model.predict(X_test_c))
print(f"{name}: Train={train_acc:.3f}, Test={test_acc:.3f}")
The underfit tree has poor accuracy on both sets. The overfit tree has perfect or near perfect training accuracy but lower test accuracy. The properly fit tree has reasonable accuracy on both with a small gap.
Understanding overfitting vs underfitting explained through examples helps you diagnose model problems. Large train/test gaps indicate overfitting. Poor performance on both indicates underfitting. Similar reasonable performance on both indicates good fit. Ready to prevent overfitting systematically? Check out our guide on regularization in machine learning to learn L1 and L2 techniques that keep your models from memorizing training data.

