I found this code online at tidyverse.org at this link:
mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .)) %>%
map(summary) %>%
map_dbl("r.squared")
The code works as expected. I'm now practicing with this same structure but using a long dataframe. You can see the code; it's mostly the same. First I convert to a tibble, add rownames for cars, select numeric variables, and make the dataframe a long data frame.
mtcars <- as_tibble(mtcars, rownames = 'car')
mtcars_numeric <- mtcars %>%
select(car, mpg, disp, hp, drat, wt, qsec)
mtcars_long_numeric <- pivot_longer(mtcars_numeric, names_to = 'names', values_to = 'values', 3:7)
mtcars_long_numeric %>%
split(.$names) %>%
map(~ lm(mpg ~ values, data = .)) %>%
map(summary) %>%
map_df("r.squared") %>%
pivot_longer(., names_to = 'explanatory_variable_to_mpg', values_to = 'r_squared', 1:5) %>%
arrange(desc(r_squared))
But what about other model statistics like p-value? How do I extract that? If I just change "r.squared" to "p.value" it doesn't work. I've tried other variations like "p_value" and "pvalue" and it doesn't work. I also don't know how to find the right names for these objects.
I can create a linear model object and look at the r.squared in the summary and get the right value.
mtcars_linear_model <- lm(mpg ~ wt, mtcars)
summary(mtcars_linear_model)$r.squared
...But outside of this vignette I don't know how I would have known that r.squared existed in the summary of linear model. If I just type the dollar sign after the summary(lm) I get values that don't exist. (Is this a bug?)
Then I tried a different tactic. I can see that if I use broom and tidy the linear model object I have other statistics:
broom::tidy(mtcars_linear_model)
Is there any way to add the broom::tidy function to these data frames involving purrr:map? The purpose would be to figure out how to extract other model statistics like p-value. Also, how do I find a comprehensive list of items I can extract from the summary of a linear model object summary(lm)$'?'
The following code doesn't work. I tried a few variations like %>% tidy() or else to wrap tidy around map(summary) like this: tidy(map(summary)) but it doesn't work.
mtcars_long_numeric %>%
split(.$names) %>%
map(~ lm(mpg ~ values, data = .)) %>%
map(summary) %>%
tidy() %>% #### ????????
map_df("r.squared") %>%
pivot_longer(., names_to = 'explanatory_variable_to_mpg', values_to = 'r_squared', 1:5) %>%
arrange(desc(r_squared))
This?. You need to use glance
instead of tidy
for model statistics.
mtcars_long_numeric %>%
nest_by(names) %>%
mutate(model = list(lm(mpg ~ values, data = data))) %>%
summarise(glance(model))
`summarise()` has grouped output by 'names'. You can override using the `.groups` argument.
# A tibble: 5 × 13
# Groups: names [5]
names r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 disp 0.718 0.709 3.25 76.5 9.38e-10 1 -82.1 170. 175. 317. 30 32
2 drat 0.464 0.446 4.49 26.0 1.78e- 5 1 -92.4 191. 195. 604. 30 32
3 hp 0.602 0.589 3.86 45.5 1.79e- 7 1 -87.6 181. 186. 448. 30 32
4 qsec 0.175 0.148 5.56 6.38 1.71e- 2 1 -99.3 205. 209. 929. 30 32
5 wt 0.753 0.745 3.05 91.4 1.29e-10 1 -80.0 166. 170. 278. 30 32