I have a lm model with mortality data depending on daily temperature. To estimate a possible adaptation to climate change, I would like to reduce the slope of the curve by 10%. Therefore, I modified the slope-coefficients of the lm model by multiplying with 0.9.
However, the predict output of this modified model is unexpected. The slope is declined, but the curves don´t meet by x=0 but by the intercept of approx. 133. That is the next question, why ist this intercept not 0?
Here is a reproducible example:
x <- seq(0, 20, 0.1)
y <- seq(0, 20, 0.1)^2
mod <- lm(y ~ poly(x, 2))
mod$coefficients
(Intercept) poly(x, 2)1 poly(x, 2)2
133.6667 1645.2355 426.9008
mody <- mod
mody$coefficients[2] <- mody$coefficients[2]*0.9
mody$coefficients[3] <- mody$coefficients[3]*0.9
mody$coefficients
Intercept) poly(x, 2)1 poly(x, 2)2
133.6667 1480.7120 384.2108
plot(x, predict(mod), type="l")
lines(x, predict(mody), col="red")
I have tried to find the reason of the shifted output and I assume it is somewhere in the predict() function. To check the modified coefficients, I tried this, and it shows the expexted output.
curve(coef(mod)[1] + coef(mod)[2] * x + coef(mod)[3] * x^2, from=0, to=20, xlab="x", ylab="y")
curve(coef(mody)[1] + coef(mody)[2] * x + coef(mody)[3] * x^2, from=0, to=20,xlab="x", ylab="y", add = T)
Wy is the predict output different?
Why is the Intercept of the example not 0?
Or how can I "predict" the modified data "by hand" without using predict()?
Thanks for your help!
EDIT: The answer of DaveArmstrong solved the problem for the first example, by using raw=TRUE in poly(). However, with other data, it is still not working maybe due to negative coefficients in the model (?)
here an example of my original data :
x <- seq(15.0, 23.5, 0.1)
y <- c(0.992, 0.998, 1.012, 1.013, 1.015, 1.021, 1.028, 1.027, 1.023, 1.029, 1.032, 1.032, 1.029, 1.036, 1.035, 1.041, 1.043, 1.043, 1.037, 1.037, 1.039, 1.037, 1.041, 1.047, 1.047, 1.048, 1.045, 1.048, 1.044, 1.037, 1.046, 1.042, 1.037, 1.034, 1.032, 1.031, 1.030, 1.034,
1.044, 1.046, 1.036, 1.034, 1.049, 1.050, 1.037, 1.041, 1.046, 1.062, 1.077, 1.084, 1.091, 1.106, 1.114, 1.127, 1.120, 1.122, 1.130,
1.122, 1.135, 1.164, 1.187, 1.186, 1.195, 1.201, 1.197, 1.204, 1.201, 1.205, 1.203, 1.200, 1.205, 1.232, 1.218, 1.218, 1.249, 1.245,
1.253, 1.227, 1.232, 1.252, 1.258, 1.254, 1.248, 1.245, 1.261, 1.289)
org <- lm(y ~ poly(x, 2, raw=TRUE))
coef(org)
(Intercept) poly(x, 2, raw = TRUE)1 poly(x, 2, raw = TRUE)2
2.240583377 -0.153426285 0.004822839
orgm <- org
orgm$coefficients[2] <- orgm$coefficients[2]*1.1 #reducing negative coefficient
orgm$coefficients[3] <- orgm$coefficients[3]*0.9
plot(predict(org), ylim=c(0,1.5), type="l")
lines(predict(orgm), col="red")
legend("topleft", legend=c("Original", "Modified"), col=c("black", "red"), lty=c(1,1))
In the resulting plot (plot), the modified model is somehow shifed to lower y-values and the slope also semms not correct. Why is this still unexpected?
I think the problem is that the poly()
function by default orthogonalizes the polynomial regressors. In your example, there is really only a relationship between the squared term in the data. If you did this with the raw polynomials instead, it should work.
x <- seq(0, 20, 0.1)
y <- seq(0, 20, 0.1)^2
mod <- lm(y ~ poly(x, 2, raw=TRUE))
mod$coefficients
# (Intercept) poly(x, 2, raw = TRUE)1 poly(x, 2, raw = TRUE)2
# -6.961533e-14 1.658415e-14 1.000000e+00
mody <- mod
mody$coefficients[2] <- mody$coefficients[2]*0.9
mody$coefficients[3] <- mody$coefficients[3]*0.9
mody$coefficients
# (Intercept) poly(x, 2, raw = TRUE)1 poly(x, 2, raw = TRUE)2
# -6.961533e-14 1.492574e-14 9.000000e-01
plot(x, predict(mod), type="l")
lines(x, predict(mody), col="red")
legend("topleft", legend=c("Original", "Modified"), col=c("black", "red"), lty=c(1,1))
For a bit more context, here is how in this example the orthogonalized polynomials relate to the raw polynomials (the first column gives the coefficients relating the raw polynomials to the first orthogonalized polynomial regressor and the second column gives the coefficients relating the raw polynomials to the second-order orthogonalized polynomial regressor).
p2 <- poly(x, 2)
round(coef(lm(p2 ~ poly(x, 2, raw=TRUE))), 5)
# 1 2
# (Intercept) -0.12156 0.15538
# poly(x, 2, raw = TRUE)1 0.01216 -0.04685
# poly(x, 2, raw = TRUE)2 0.00000 0.00234
Substituting these into the equation with the orthogonal polynomials, you would get the following (where represents the orthogonalized regressor):
When you multiply the orthogonalized polynomial coefficients by 0.9, you're doing the following:
With respect to the original variables, you're also changing the intercept when you modify the coefficients on the orthogonalized regressors.
The solution above worked because the relationship of interest was relatively simple - the intercept and coefficient for the first order term were both approximately zero. When this isn't the case, the answer is somewhat more complex. In the real-data example proposed above, the x
variable has a minimum value of 15. My assumption is that we would want the two curves to meet at 15, but have the modified one with a shallower slope. For this, we need to think about what this means in terms of the original and modified coefficients. In particular, we would need the two equations to produce the same prediction when x=15. Using b
to represent the original coefficients and b'
to represent the modified coefficients, what we would want is the following to be true:
Doing the bit of algebra out, you would get:
So, implementing this, let's say you multiplied the coefficient on the first order polynomial term by .9, this would give:
orgm <- org
orgm$coefficients[2] <- orgm$coefficients[2]*0.9
orgm$coefficients[2]
# poly(x, 2, raw = TRUE)1
# -0.1379442
We could then calculate the difference between the original and modified coefficients:
diff <- org$coefficients[2] - orgm$coefficients[2]
diff
# poly(x, 2, raw = TRUE)1
# -0.01532713
Finally, we could plug this and the original coefficient on the second-order polynomial regressor into the formula to create the modified second-order polynomial regressor coefficient:
orgm$coefficients[3] <- diff/15 + org$coefficients[3]
orgm$coefficients
# (Intercept) poly(x, 2, raw = TRUE)1 poly(x, 2, raw = TRUE)2
# 2.239156804 -0.137944190 0.003796868
Then, we could make the plot:
plot(x, predict(org), ylim=c(0,1.5), type="l")
lines(x, predict(orgm), col="red")
legend("topleft", legend=c("Original", "Modified"), col=c("black", "red"), lty=c(1,1))
I think this is the result you're looking for: