• ## Latest News

Thursday, 21 February 2019

# Machine Learning | Prediction Models | Bias-Variance Tradeoff

Whenever we discuss model prediction, it’s important to understand prediction errors (bias and variance). Many times, there is a trade-off between a model’s ability to minimize bias and variance. Gaining a proper understanding of these errors would help us not only to build accurate models but also to avoid the mistake of overfitting and underfitting.
What is bias?
Bias is the difference between the average prediction of the model and the actual value which we are trying to predict. Models with high bias pay very less attention to the training data and oversimplify the model. It generally leads to a high error on training and test data. This is what we generally define as Underfitting.
What is variance?
Variance is described as the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not mainly generalize on the data which it hasn’t seen before. As a result, these models perform well on the training data but have high error rates on test data. This is what we generally define as Overfitting.

What programmers actually want is to choose a model that accurately extracts the regularities in the training data and, it also generalizes well to the unseen data.

The most important part of the whole data mining process is “Understanding a way to test and validate their model with respect to real-world data” and one that is commonly ignored.
What we ideally want is low-bias and low-variance. To achieve this, let’s first have a look at a typical bias squared-variance curve.
x

In supervised learning, underfitting occurs when a model is unable to capture the existing patterns of the data. They usually have high bias and low variance. It mainly happens when we have very less amount of data to build an accurate model. It also happens when we try to build a linear model using non-linear data. Also, these models are very simple to capture the complex patterns in data like Linear and logistic regression. One of the ways of reducing the underfitting problem is to increase the size of the dataset.
In supervised learning, overfitting mainly occurs when our model captures the noise along with the existing patterns in data. It happens when we train our model many times over a noisy dataset. These models have low bias and high variance. These models are very complex like Decision trees.
If the model is very simple and has very fewer parameters, then it may have high bias and low variance. On the other hand, if the model has a large number of parameters, then it’s going to have high variance and low bias. So, we need to find the right/good balance without overfitting and underfitting the data. An algorithm can’t be more complex and less complex at the same time.
Total Error
To build a good model, we first need to find a good balance between bias and variance, so that the total error is minimized. An optimal/perfect balance of bias and variance would never overfit or underfit the model.

Even if we had a perfect model, we might not be able to remove the errors made by a learning algorithm completely. It is because sometimes the training data itself contains noise. This error is called Irreducible error or Bayes’ error rate or the Optimum Error rate.

Fixing the Problem!

When you are new to machine learning, you may feel lost when the model you have trained does not perform well enough. Generally, people waste a lot of time trying to figure out different things based on what they feel is right.

Working on Code!
Let’s see who we can identify that out model is overfitting or underfitting.

Overfitting
We will take example of Decision Trees.

We will first see what is the result when we generally split our dataset into training and test data and train our model. We try to predict the training and test accuracy and here is the result that we get.
Note: The dataset used here is that iris dataset that comes with the sklearn library.

Here, we can see that training accuracy is 1.00 i.e. 100% and the test set accuracy is comparatively less. This clearly shows that it is Overfitting.

We need to provide additional strategies to avoiding overfitting. One of the methods is that we prevent the tree from becoming much detailed and complex by stopping it’s growth early. This is called Pre-pruning. Another strategy is to build a complete tree with pure leaves but then to prune back the tree into a simpler form. This is called post-pruning.

The max_depth parameter is brought into action to help this procedure.

Here, the training set accuracy has degraded a little, but compared to test set, it looks good.

Suppose we want to look at our tree decision tree classifier after setting the maximum depth. We can plot the tree using the plot_decision_tree method.

Output –

Underfitting –

Here, we will take an example of Linear Regression.

We randomly plotted 10 points on the graph and gave them some weights (features).

Output -

We see that the straight-line fits well but isn’t good enough. So, we try a more suitable approach.

To improve accuracy, we need more features.

The easiest way to add more Features, is to computing polynomial features from the provided features. It means that if we have X, then we can use X2, X3, etc as additional features.

Output -