ranovasummary

runnig ANOVA for selected variables only with grep() in r


I am trying to run ANOVA for multiple outcomes selected with grep(). Below is close to what I have, but this doesn't work, of course. It seems like there is an elegant and efficient way of doing this with purrr::map() or lapply, but I cannot figure out how. Also, it would be great if the result for each variable could be stored as list(?). I think I don't fully understand data types and how they work in R, which makes me very confused now. I would appreciate your advice on the solution!

varlist <- grep("num_weeks_", names(crao2), value=TRUE)
for (i in varlist) {
   anova <- aov(i ~ treatment, data = df)
   summary(anova)
   TukeyHSD(anova)
   rm(anova)
}

Solution

  • I prefer to do things like this using lists and functions like sapply or map. Rather than doing all of your steps in the loop, I would first do all the calls to aov to create an initial list, then call summary and TukeyHSD on that list.

    First create the list:

    varlist <- grep('p', names(mtcars), value=TRUE)
    
    aov.list <- sapply(varlist, function(v){
      f <- reformulate('factor(gear)', v)
      aov(f, data=mtcars)
    }, simplify = FALSE)
    

    Now aov.list (or whatever you want to name it) is a list with each of the fitted objects and the names of the list are the values of varlist (this is why I use sapply with simplify = FALSE rather than lapply).

    One drawback to the above is that if you look at the call element of each list it just shows f for the formula.

    We can make the call look more like we did these individually by hand by substituteing and evaluating:

    aov.list <- sapply(varlist, function(v){
      f <- reformulate('factor(gear)', v)
      eval(substitute(aov(f, data=mtcars), list(f=f)))
    }, simplify = FALSE)
    

    If you want to use map from the tidyverse/purrr, this does the same thing:

    aov.list <- varlist |>
      set_names() |>
      map(function(v){
        f <- reformulate('factor(gear)', v)
        eval(substitute(aov(f, data=mtcars), list(f=f)))
      })
    

    Now we can use lapply or map to do the next steps:

    lapply(aov.list, summary)
    
    aov.list |>
      map(TukeyHSD)
    

    The above just prints the results since we did not assign them. But we could assign the results to new lists for further examination.