I'm running a weighted regression model, but I have no idea how to deal with some variable that I need to put inside
My dependent variable has values with a scale of thousands, while my independent variables have scale of tens and hundreds or are categorical variables.
I usually run the regression with the log of the dependent variable ( in this way I can interpret the estimated coefficient as % increase )
Here an example
How to handle instead with a variable between the regressors that has a scale of millions ?
For example, I include in my regression the variable occ_tot
expressed in millions. This is what happens
How should I interpret these coefficients? Is there a nice way to include an independent variable with a bigger scale of the dependent one?
I'm new with these kind of things...
We can scale a predictor as desired and normally the coefficients will simply compensate so that if we multiply a predictor by 100 the corresponding coefficient will get divided by 100 and visa versa while the other coefficients will not be affected.
If some predictors are close to linearly dependent one can run into problems but that is the case even without scaling so that is really a separate problem. Look at findCorrelation
in caret to eliminate highly correlated predictors and try it with and without eliminating such predictors to see if it matters in your case.
The first lm
below is our original regression. In the second lm
we multiply the wt
predictor by 100 and we see that the coefficient simply gets divided by 100 and the other coefficients stay the same. A similar thing happens with the third lm
where we divide by 100 and the coefficient compensates again while the other coefficients are again unchanged.
Also note that all three of these are really the same model except for parameterization so they result in the same fitted values, the same residuals and the same residual sum of squares.
coef(lm(mpg ~ cyl + wt, mtcars)) # original lm
## (Intercept) cyl wt
## 39.686261 -1.507795 -3.190972
coef(lm(mpg ~ cyl + I(100 * wt), mtcars))
## (Intercept) cyl I(100 * wt)
## 39.68626148 -1.50779497 -0.03190972
coef(lm(mpg ~ cyl + I(wt/100), mtcars))
## (Intercept) cyl I(wt/100)
## 39.686261 -1.507795 -319.097214
If we have a log predictor then changing wt
to 100*wt
will only affect the intercept because log(100 * wt)
= log(100) + log(wt)
. Below the first lm
we take log(wt)
and in the second one we take log(100*wt)
.
coef(lm(mpg ~ cyl + log(wt), mtcars))
## (Intercept) cyl log(wt)
## 40.649777 -1.233526 -11.523812
coef(lm(mpg ~ cyl + log(100 * wt), mtcars))
## (Intercept) cyl log(100 * wt)
## 93.718894 -1.233526 -11.523812