I've figured out how to run a 2-way anova on several variables in my data frame, but not sure how to get this into a format that could be easily exported to a csv file or excel. Ideally, I'd like it to have this in a format where each of my several hundred dependent variables is in it's own row, with the pVaules and Fvalues
I've made an example using the titanic dataset. In this case I've set Sex & Embarked as my categorical variables, and would like the output for the effects of Sex
Embarked
and ~Interaction
somehow saved to a file. I'm open to suggestions on how to output this -- just want to be able to easily identify what values are significant, ideally with each dependent variable on its own line.
library(titanic)
library(tidyverse)
df1 <-
titanic_train %>%
select(Sex, Embarked, (1:10)) %>%
select(!("Name" | "Ticket")) %>%
filter(Embarked != "") # deleting empty Embarked status
names(df1)
df1$Sex<- factor(df1$Sex)
df1$Embarked <-factor(df1$Embarked)
#store all formulae in a list
formulae <- lapply(colnames(df1)[3:ncol(df1)], function(x) as.formula(paste0(x, " ~ Sex * Embarked")))
#go through list and run aov()
results <- lapply(formulae, function(x) summary(aov(x, data = df1)))
names(results) <- format(formulae)
results
You can extract the relevant statistics from the summary
or store the model in a list and use broom::tidy
on it to get all the stats together in a dataframe. Use map
functions to run it on list of models.
library(purrr)
library(broom)
results <- lapply(formulae, function(x) aov(x, data = df1))
names(results) <- format(formulae)
data <- map_df(results, tidy, .id = 'formulae')
data
# A tibble: 28 x 7
# formulae term df sumsq meansq statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 PassengerId ~… Sex 1 1.09e+5 1.09e+5 1.65 1.99e- 1
# 2 PassengerId ~… Embark… 2 5.50e+4 2.75e+4 0.416 6.60e- 1
# 3 PassengerId ~… Sex:Em… 2 7.73e+4 3.86e+4 0.584 5.58e- 1
# 4 PassengerId ~… Residu… 883 5.84e+7 6.61e+4 NA NA
# 5 Survived ~ Se… Sex 1 6.16e+1 6.16e+1 376. 4.44e-70
# 6 Survived ~ Se… Embark… 2 3.32e+0 1.66e+0 10.1 4.39e- 5
# 7 Survived ~ Se… Sex:Em… 2 4.85e-1 2.43e-1 1.48 2.28e- 1
# 8 Survived ~ Se… Residu… 883 1.45e+2 1.64e-1 NA NA
# 9 Pclass ~ Sex … Sex 1 1.01e+1 1.01e+1 16.2 6.12e- 5
#10 Pclass ~ Sex … Embark… 2 5.83e+1 2.91e+1 46.8 4.74e-20
# … with 18 more rows
Write data
to csv.
write_csv(data, 'data.csv')