rlinear-regressionpurrrtidybroom

R split/map from purrr combined with tidy from broom to get linear model statistics by group


I found this code online at tidyverse.org at this link:

mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared")

enter image description here

The code works as expected. I'm now practicing with this same structure but using a long dataframe. You can see the code; it's mostly the same. First I convert to a tibble, add rownames for cars, select numeric variables, and make the dataframe a long data frame.

mtcars <- as_tibble(mtcars, rownames = 'car')

mtcars_numeric <- mtcars %>%
  select(car, mpg, disp, hp, drat, wt, qsec) 

mtcars_long_numeric <- pivot_longer(mtcars_numeric, names_to = 'names', values_to = 'values', 3:7)


mtcars_long_numeric %>%
  split(.$names) %>%
  map(~ lm(mpg ~ values, data = .)) %>%
  map(summary) %>%
  map_df("r.squared") %>%
  pivot_longer(., names_to = 'explanatory_variable_to_mpg', values_to = 'r_squared', 1:5) %>%
  arrange(desc(r_squared))

enter image description here

But what about other model statistics like p-value? How do I extract that? If I just change "r.squared" to "p.value" it doesn't work. I've tried other variations like "p_value" and "pvalue" and it doesn't work. I also don't know how to find the right names for these objects.

enter image description here

I can create a linear model object and look at the r.squared in the summary and get the right value.

mtcars_linear_model <- lm(mpg ~ wt, mtcars)

summary(mtcars_linear_model)$r.squared

enter image description here

...But outside of this vignette I don't know how I would have known that r.squared existed in the summary of linear model. If I just type the dollar sign after the summary(lm) I get values that don't exist. (Is this a bug?)

enter image description here

Then I tried a different tactic. I can see that if I use broom and tidy the linear model object I have other statistics:

broom::tidy(mtcars_linear_model)

enter image description here

Is there any way to add the broom::tidy function to these data frames involving purrr:map? The purpose would be to figure out how to extract other model statistics like p-value. Also, how do I find a comprehensive list of items I can extract from the summary of a linear model object summary(lm)$'?'

The following code doesn't work. I tried a few variations like %>% tidy() or else to wrap tidy around map(summary) like this: tidy(map(summary)) but it doesn't work.

mtcars_long_numeric %>%
  split(.$names) %>%
  map(~ lm(mpg ~ values, data = .)) %>%
  map(summary) %>%
  tidy() %>% #### ????????
  map_df("r.squared") %>%
  pivot_longer(., names_to = 'explanatory_variable_to_mpg', values_to = 'r_squared', 1:5) %>%
  arrange(desc(r_squared))

Solution

  • This?. You need to use glance instead of tidy for model statistics.

    mtcars_long_numeric %>% 
      nest_by(names) %>% 
      mutate(model = list(lm(mpg ~ values, data = data))) %>% 
      summarise(glance(model))
    `summarise()` has grouped output by 'names'. You can override using the `.groups` argument.
    # A tibble: 5 × 13
    # Groups:   names [5]
      names r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <chr>     <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
    1 disp      0.718         0.709  3.25     76.5  9.38e-10     1  -82.1  170.  175.     317.          30    32
    2 drat      0.464         0.446  4.49     26.0  1.78e- 5     1  -92.4  191.  195.     604.          30    32
    3 hp        0.602         0.589  3.86     45.5  1.79e- 7     1  -87.6  181.  186.     448.          30    32
    4 qsec      0.175         0.148  5.56      6.38 1.71e- 2     1  -99.3  205.  209.     929.          30    32
    5 wt        0.753         0.745  3.05     91.4  1.29e-10     1  -80.0  166.  170.     278.          30    32