I have dataset with multiple outcome variables that I would like to test against one predictor. on exploratory analysis I noted some of the relationships are polynomial to the degree 2 rather than linear. I would like to look at the BIC and AIC to make my decision of which is the best model to run.
I have lapply
function where I can iterate over multiple outcome variables but now I would like to add a second model and compare their fit. However when I run this function, it only saves the second model and I dont know how to get to 'outputs' to run through the next function. Can I have this within one function or do I need two?
Here is a example from the iris dataset
data(iris)
vars <- names(iris[2:4])
models2 <- lapply(vars, function(x) {
model_list=list(
mod1=lm(substitute(i ~ Sepal.Length, list(i=as.name(x))), data=iris),
mod2=lm(substitute(i ~ poly(Sepal.Length,2), list(i=as.name(x))), data=iris))
})
y <- lapply(models2, summary) #This only saves results from mod2
How do I then compare mod1
to mod2
fit and extract the following variable's?
data.frame(
do.call(merge, list(BIC(mod1, mod2), AIC(mod1, mod2))),
logLik=sapply(list(mod1, mod2), logLik),
anova(mod1, mod2, test='Chisq'))
First, make the lm
nicer using do.call
and reformulate
. Then lapply
over the models like this:
> models2 <- lapply(setNames(vars, vars), function(x) {
+ list(
+ mod1=do.call('lm', list(reformulate('Sepal.Length', x), quote(iris))),
+ mod2=do.call('lm', list(reformulate('poly(Sepal.Length, 2)', x), quote(iris)))
+ )
+ })
>
> (res <- lapply(models2, \(x) data.frame(
+ with(x, do.call('merge', list(BIC(mod1, mod2), AIC(mod1, mod2)))),
+ logLik=with(x, sapply(list(mod1, mod2), logLik)),
+ with(x, anova(mod1, mod2))
+ )))
$Sepal.Width
df BIC AIC logLik Res.Df RSS Df Sum.of.Sq F Pr..F.
1 3 188.4963 179.4644 -86.73221 148 27.91566 NA NA NA NA
2 4 189.8107 177.7682 -84.88410 147 27.23618 1 0.6794752 3.667285 0.05743267
$Petal.Length
df BIC AIC logLik Res.Df RSS Df Sum.of.Sq F Pr..F.
1 3 396.1669 387.1350 -190.5675 148 111.4592 NA NA NA NA
2 4 389.8649 377.8223 -184.9112 147 103.3623 1 8.096848 11.51519 0.000887343
$Petal.Width
df BIC AIC logLik Res.Df RSS Df Sum.of.Sq F Pr..F.
1 3 192.4030 183.3711 -88.68553 148 28.65225 NA NA NA NA
2 4 182.3757 170.3331 -81.16656 147 25.91907 1 2.733179 15.50122 0.000126881
You could additionally rbind
.
> do.call('rbind', res)
df BIC AIC logLik Res.Df RSS Df Sum.of.Sq F Pr..F.
Sepal.Width.1 3 188.4963 179.4644 -86.73221 148 27.91566 NA NA NA NA
Sepal.Width.2 4 189.8107 177.7682 -84.88410 147 27.23618 1 0.6794752 3.667285 0.057432671
Petal.Length.1 3 396.1669 387.1350 -190.56750 148 111.45916 NA NA NA NA
Petal.Length.2 4 389.8649 377.8223 -184.91116 147 103.36231 1 8.0968484 11.515191 0.000887343
Petal.Width.1 3 192.4030 183.3711 -88.68553 148 28.65225 NA NA NA NA
Petal.Width.2 4 182.3757 170.3331 -81.16656 147 25.91907 1 2.7331790 15.501223 0.000126881
Data:
> data(iris)
> vars <- names(iris[2:4])