machine-learningdata-modelingvariancesupervised-learning

Can a model have both high bias and high variance? Overfitting and Underfitting?


As I understand it when creating a supervised learning model, our model may have high bias if we are making very simple assumptions (for example if our function is linear) which cause the algorithm to miss relationships between our features and target output resulting in errors. This is underfitting.

On the other hand if we make our algorithm too strong (many polynomial features), it'll be very sensitive to small fluctuations in our training set causing ovefitting: modeling the random noise in the training data, rather than the intended outputs. This is overfitting.

image showing underfitting and overfitting

This makes sense to me, but I heard that a model can have both high variance and high bias and I just don't understand how that would possible. If high bias and high variance are synonyms for underfitting and overfitting, then how can you have both overfitting and underfitting on the same model? Is it possible? How can it happen? What does it look like when it does happen?


Solution

  • Imagine a regression problem. I define a classifier which outputs the maximum of the target variable observed in the training data, for all possible inputs.

    This model is both biased (can only represent a singe output no matter how rich or varied the input) and has high variance (the max of a dataset will exhibit a lot of variability between datasets).

    You're right to a certain extent that bias means a model is likely to underfit and variance means it's susceptible to overfitting, but they're not quite the same.