My current problem is to calculate the variance explained by the different variables of a general additive model (GAM) with R.
I followed the explanation given by Wood here : https://stat.ethz.ch/pipermail/r-help/2007-October/142743.html
But I would like to do it with three variables. I tried this :
library(mgcv)
set.seed(0)
n<-400
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
f1 <- function(x) exp(2 * x) - 3.75887
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
f3 <- function(x) 0.008*x^2 - 1.8*x + 874
f <- f1(x1) + f2(x2) + f3(x3)
e <- rnorm(n, 0, 2)
y <- f + e
b <- gam(y ~ s(x1, k = 3)+s(x2, k = 3)+ s(x3, k = 3))
b3 <- gam(y ~ s(x1) + s(x2), sp = c(b$sp[1], b$sp[2]))
b2 <- gam(y ~ s(x1) + s(x3), sp = c(b$sp[1], b$sp[3]))
b1 <- gam(y ~ s(x2) + s(x3), sp = c(b$sp[2], b$sp[3]))
b0 <- gam(y~1)
(deviance(b1)-deviance(b))/deviance(b0)
(deviance(b2)-deviance(b))/deviance(b0)
(deviance(b3)-deviance(b))/deviance(b0)
But I don't understand results. For example, the model with only x1 and x2 has a deviance smaller than deviance with the three explanatory variable.
Does the method I used to extract variance explained by variable with three variables is correct?
Does it mean that there is a confounding effect in the global model? Or is there another explanation?
You did something wrong here:
b <- gam(y ~ s(x1, k = 3) + s(x2, k = 3) + s(x3, k = 3))
b3 <- gam(y ~ s(x1) + s(x2), sp = c(b$sp[1], b$sp[2]))
b2 <- gam(y ~ s(x1) + s(x3), sp = c(b$sp[1], b$sp[3]))
b1 <- gam(y ~ s(x2) + s(x3), sp = c(b$sp[2], b$sp[3]))
Why did you set k = 3
in the first line, while not setting k = 3
for the rest? Without specifying k
, s()
will take default value k = 10
. Now you get a problem: b1
, b2
, b3
are not nested in b
.
In Simon Wood's original example, he left k
unspecified, so that k=10
is taken for all s()
. In fact, you can vary k
values, but you must gurantee that you always have the same k
for the same covariate (to ensure nesting). For example, you can do:
b <- gam(y ~ s(x1, k = 4) + s(x2, k = 6) + s(x3, k = 3))
b3 <- gam(y ~ s(x1, k = 4) + s(x2, k = 6), sp = c(b$sp[1], b$sp[2])) ## droping s(x3) from b
b2 <- gam(y ~ s(x1, k = 4) + s(x3, k = 3), sp = c(b$sp[1], b$sp[3])) ## droping s(x2) from b
b1 <- gam(y ~ s(x2, k = 6) + s(x3, k = 3), sp = c(b$sp[2], b$sp[3])) ## droping s(x1) from b
Then let's do:
(deviance(b1)-deviance(b))/deviance(b0)
# [1] 0.2073421
(deviance(b2)-deviance(b))/deviance(b0)
# [1] 0.4323154
(deviance(b3)-deviance(b))/deviance(b0)
# [1] 0.02094997
The positive values imply that dropping any model term will inflate the deviance, which is sensible as our true model have all three terms.