I am trying to use glm in R using a dataframe containing ~ 1000 columns, where I want to select a specific independent variable and run as a loop for each of the 1000 columns representing the dependent variables.
As a test, the glm equation works perfectly fine when I specify a single column using df$col1
for both my dependent and independent variables.
I can't seem to correctly subset a range of columns (below) and I keep getting this error, no matter how many ways I try to format the df:
'data' must be a data.frame, environment, or list
What I tried:
df = my df
cols <- df[, 20:1112]
for (i in cols{
glm <- glm(df$col1 ~ ., data=df, family=gaussian)
}
It would be more idiomatic to do:
predvars <- names(df)[20:1112]
glm_list <- list() ## presumably you want to save the results??
for (pv in predvars) {
glm_list[[pv]] <- glm(reformulate(pv, response = "col1"),
data=df, family=gaussian)
}
In fact, if you really just want to do a Gaussian GLM then it will be slightly faster to use
lm(reformulate(pv, response = "col1"), data = df)
in the loop instead.
If you want to get fancy:
formlist <- lapply(predvars, reformulate, response = "col1")
lm_list <- lapply(formlist, lm, data = df)
names(lm_list) <- predvars