rloopslapplyanova

2-Way Anova Loop & Export to CSV or Excel Table


I've figured out how to run a 2-way anova on several variables in my data frame, but not sure how to get this into a format that could be easily exported to a csv file or excel. Ideally, I'd like it to have this in a format where each of my several hundred dependent variables is in it's own row, with the pVaules and Fvalues

I've made an example using the titanic dataset. In this case I've set Sex & Embarked as my categorical variables, and would like the output for the effects of Sex Embarked and ~Interaction somehow saved to a file. I'm open to suggestions on how to output this -- just want to be able to easily identify what values are significant, ideally with each dependent variable on its own line.

library(titanic)
library(tidyverse)

df1 <- 
  titanic_train %>%
  select(Sex, Embarked, (1:10)) %>%
  select(!("Name" | "Ticket")) %>%
  filter(Embarked != "") # deleting empty Embarked status
names(df1)

df1$Sex<- factor(df1$Sex)
df1$Embarked <-factor(df1$Embarked)

#store all formulae in a list
formulae <- lapply(colnames(df1)[3:ncol(df1)], function(x) as.formula(paste0(x, " ~ Sex * Embarked")))

#go through list and run aov()
results <- lapply(formulae, function(x) summary(aov(x, data = df1)))
names(results) <- format(formulae)
results

Solution

  • You can extract the relevant statistics from the summary or store the model in a list and use broom::tidy on it to get all the stats together in a dataframe. Use map functions to run it on list of models.

    library(purrr)
    library(broom)
    
    results <- lapply(formulae, function(x) aov(x, data = df1))
    names(results) <- format(formulae)
    data <- map_df(results, tidy, .id = 'formulae')
    data
    
    # A tibble: 28 x 7
    #   formulae       term       df    sumsq  meansq statistic   p.value
    #   <chr>          <chr>   <dbl>    <dbl>   <dbl>     <dbl>     <dbl>
    # 1 PassengerId ~… Sex         1  1.09e+5 1.09e+5     1.65   1.99e- 1
    # 2 PassengerId ~… Embark…     2  5.50e+4 2.75e+4     0.416  6.60e- 1
    # 3 PassengerId ~… Sex:Em…     2  7.73e+4 3.86e+4     0.584  5.58e- 1
    # 4 PassengerId ~… Residu…   883  5.84e+7 6.61e+4    NA     NA       
    # 5 Survived ~ Se… Sex         1  6.16e+1 6.16e+1   376.     4.44e-70
    # 6 Survived ~ Se… Embark…     2  3.32e+0 1.66e+0    10.1    4.39e- 5
    # 7 Survived ~ Se… Sex:Em…     2  4.85e-1 2.43e-1     1.48   2.28e- 1
    # 8 Survived ~ Se… Residu…   883  1.45e+2 1.64e-1    NA     NA       
    # 9 Pclass ~ Sex … Sex         1  1.01e+1 1.01e+1    16.2    6.12e- 5
    #10 Pclass ~ Sex … Embark…     2  5.83e+1 2.91e+1    46.8    4.74e-20
    # … with 18 more rows
    

    Write data to csv.

    write_csv(data, 'data.csv')