I'm adjusting a response surface using the lm
function on R. On output, using fit[["fitted.values"]]
, the reported values are consistent with the actual values. However, when I use the coefficients generated for each effect (Intercept, x1, x1*x2
) and do the manual calculation, the values presented are very discrepant.
How does the fitted.values
function do its calculations?
Below, part of the code and outputs:
data = read.csv("data.csv", header = TRUE, sep = ";")
head(data)
fit <- lm(rend ~ poly(wis, enz, degree = 2), data = data)
summary(fit)
The output for head(data)
:
wis enz rend
1 10.00 3 68
2 10.25 3 66
3 10.50 3 64
4 10.75 3 62
5 11.00 3 61
6 11.25 3 60
Part of the output of fit[["fitted.values"]]
:
1 2 3 4 5 6 7 8 9 10 11 12
67.02258 65.46832 64.01733 62.66962 61.42517 60.28399 59.24609 58.31146 57.48009 56.75200 56.12718 55.60564
13 14 15 16 17 18 19 20 21 22 23 24
55.18736 54.87235 54.66062 54.55215 54.54696 54.64504 54.84639 55.15101 55.55891 56.07007 56.68450 57.40221
And the summary coeficients:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.23929 0.02293 2888.536 <2e-16 ***
poly(wis, enz, degree = 2)1.0 -24.26976 0.70940 -34.211 <2e-16 ***
poly(wis, enz, degree = 2)2.0 129.35949 0.70940 182.349 <2e-16 ***
poly(wis, enz, degree = 2)0.1 130.19258 0.70940 183.524 <2e-16 ***
poly(wis, enz, degree = 2)1.1 -701.60447 21.94572 -31.970 <2e-16 ***
poly(wis, enz, degree = 2)0.2 0.48360 0.70940 0.682 0.496
What I did was apply these coefficients as follows, considering only the significant ones:
rend = 66.24 - 24.26*wis + 129.36*wis^2 + 130.19*enz - 701.60*wis*enz
As an example, using the first values of the output of head(data):
rend = 66.24 - 24.26*10 + 129.36*10^2 + 130.19*3 - 701.60*10*3
Which results in 7897.79 which is different from estimated by fitted.values
which was 67.02
What am I doing wrong?
The poly()
function constructs an orthogonal polynomial basis by default, which is very different from the standard b0 + b1*x1 + b2*x2 + b12*x1*x2 ...
parameterization. If you want the standard parameterization you can use poly(wiz, env, degree = 2, raw = TRUE)
.
If you want to see what poly()
is doing by default you can use model.matrix(fit)
to see what's in the X
matrix (in the sense of a linear model defined as y = X %*% beta
) (the actual mathematical details/definition of how the polynomials are defined is a little deep: What does the R function `poly` really do? )