machine-learningstatistics

What do the coefficients on correlated variables mean?


The coefficients on uncorrelated variables denote the degree to which the unique information in them influences the final variable. But what do the coefficients on correlated variables mean - who is how much of a “tug-of-war”? (no math, please)


Solution

  • Seems you are asking about multicollinearity - when independent features in a linear regression model are highly correlated with each other.

    Let's imagine that we want to buy a used car. Most likely, the mileage (x1) will strongly correlate with the age of the car (x2): the older the car, the more km it has probably traveled, and vice versa.
    The linear regression model looks like this: F(x) = w0 + w1x1 + w2x2, where w are the weights, the "contribution" of each feature x to the target variable.

    If the mileage coef (w1) is big, it may mean that used cars with lower mileage have higher prices, and the age of the car coef (w2) may be also high. As a result, it is difficult to understand what unique information each feature brings. It may be difficult to explain their individual effects (uncertainty), and it's leading to problems such as multicollinearity. With no math - multicollinearity is very bad :-)

    Usually such features are removed or transformed (for example, one feature can be made - milage per year for example)