What is Overfitting?
Overfitting is a common problem in machine learning where a model becomes too finely tuned to its training data. Instead of learning the underlying patterns that generalise well, the model memorises specific details – including random noise and errors – that don't appear in real-world data.
Think of it like studying for an exam by memorising exact sample questions rather than understanding the principles. You might ace the sample test, but struggle with different questions on the actual exam.
Why Overfitting Matters in Advertising
In media buying and marketing, overfitting can significantly impact campaign performance:
Poor Prediction Accuracy: An overfitted model might predict perfectly on historical campaign data but fail when targeting new audience segments or market conditions.
Wasted Budget: Your AI might identify patterns that worked in past data but don't reflect genuine audience behaviour, leading to inefficient ad spend allocation.
Lack of Scalability: Models that overfit to specific campaigns, seasons, or platforms won't transfer effectively to new initiatives.
Real-World Example
Imagine you're using AI to optimise your Google Ads bidding strategy. You train your model on 6 months of data from a single product category. The model identifies that every time it rains on Tuesdays in London, conversions spike by 3%. This pattern might exist in your training data purely by coincidence.
When the model applies this rule to new campaigns or different seasons, it performs poorly because the pattern wasn't real – it was noise the model overfitted to.
Overfitting vs Underfitting
While overfitting memorises training data too well, underfitting is the opposite problem: the model is too simple to capture important patterns. Both reduce performance on new data. The goal is finding the "sweet spot" where your model generalises well.
How to Prevent Overfitting
Use Validation Data: Split your data into training, validation, and test sets. Monitor performance on validation data – if training accuracy rises but validation accuracy plateaus or drops, you're overfitting.
Regularisation: Apply techniques like L1/L2 regularisation that penalise overly complex models, encouraging simpler, more generalisable solutions.
Cross-Validation: Test your model across multiple data splits to ensure it performs consistently across different subsets.
Early Stopping: Stop training when validation performance stops improving, rather than continuing until training data is perfectly fitted.
Feature Reduction: Use only the most relevant variables. More features increase overfitting risk.
Increase Training Data: Larger datasets are harder to overfit to, as random noise becomes statistically less significant.
Overfitting in Practice
In a typical media buying scenario, you might notice overfitting when:
- Your model shows 95% accuracy on historical data but 70% on current campaigns
- Predictions work well for one platform (e.g., Facebook) but fail on another (e.g., LinkedIn)
- Campaign rules that worked last quarter don't translate to this quarter
Key Takeaway
Overfitting is a critical consideration when developing AI models for advertising. Always validate your models on data they haven't seen before, and focus on building systems that generalise well to new situations rather than perfecting performance on old data.