rfunctiondplyrbroomglance

Error: No glance method recognized for this list


I'm trying to write a function that can flexibly group by a variable number of arguments and fit a linear model to each subset. The output should be a table with each row showing the grouping variable(s) and corresponding lm call results that broom::glance provides. But I can't figure out how to structure the output. Code that produces the same error is as follows:

library(dplyr)
library(broom)

test_fcn <- function(var1, ...) {
  x <- unlist(list(...))
  mtcars %>% 
    group_by(across(all_of(c('gear', x)))) %>% 
    mutate(mod = list(lm(hp ~ !!sym(var1), data = .))) %>% 
    summarize(broom::glance(mod)) 
}

test_fcn('qsec', 'cyl', 'carb')

I'm pushing my R/dplyr comfort zone by mixing static and dynamic variable arguments, so I've left them here in case that's a contributing factor. Thanks for any input!


Solution

  • You were nearly there.

    test_fcn <- function(var1, ...) {
        x <- unlist(list(...))
        mtcars %>% 
            group_by(across(all_of(c('gear', x)))) %>% 
            summarise(
                mod = list(lm(hp ~ !!sym(var1), data = .)),
                mod = map(mod, broom::glance),
                .groups = "drop")
    }
    
    test_fcn('qsec', 'cyl', 'carb') %>% unnest(mod)
    ## A tibble: 12 × 15
    #    gear   cyl  carb r.squared adj.r.sq…¹ sigma stati…² p.value    df logLik   AIC   BIC devia…³ df.re…⁴
    #   <dbl> <dbl> <dbl>     <dbl>      <dbl> <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>   <int>
    # 1     3     4     1     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    # 2     3     6     1     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    # 3     3     8     2     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    # 4     3     8     3     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    # 5     3     8     4     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    # 6     4     4     1     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    # 7     4     4     2     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    # 8     4     6     4     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    # 9     5     4     2     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    #10     5     6     6     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    #11     5     8     4     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    #12     5     8     8     0.502      0.485  49.2    30.2 5.77e-6     1  -169.  344.  348.  72633.      30
    ## … with 1 more variable: nobs <int>, and abbreviated variable names ¹​adj.r.squared, ²​statistic,
    ##   ³​deviance, ⁴​df.residual
    ## ℹ Use `colnames()` to see all variable names
    

    Because you are storing the lm fit objects in a list, you need to loop over the entries using purrr::map.

    You might want to put the unnest into the test_fcn: a slightly more compact version would be

    test_fcn <- function(var1, ...) {
        x <- unlist(list(...))
        mtcars %>% 
            group_by(across(all_of(c('gear', x)))) %>% 
            summarise(
                mod = map(list(lm(hp ~ !!sym(var1), data = .)), broom::glance),
                .groups = "drop") %>%
            unnest(mod)
    }
    

    Update

    Until your comment, I hadn't realised that the grouping was ignored. Here is a nest-unnest-type solution.

    test_fcn <- function(var1, ...) {
        x <- list(...)
        mtcars %>% 
            group_by(across(all_of(c('gear', x)))) %>%
            nest() %>%
            ungroup() %>%
            mutate(mod = map(
                data, 
                ~ lm(hp ~ !!sym(var1), data = .x) %>% broom::glance())) %>%
            unnest(mod)
    }
    
    test_fcn('qsec', 'cyl', 'carb')
    ## A tibble: 12 × 16
    #     cyl  gear  carb data     r.squared adj.r.s…¹      sigma statis…²  p.value    df logLik
    #   <dbl> <dbl> <dbl> <list>       <dbl>     <dbl>      <dbl>    <dbl>    <dbl> <dbl>  <dbl>
    # 1     6     4     4 <tibble>    0.911     0.867    2.74e+ 0  20.5      0.0454     1  -8.32
    # 2     4     4     1 <tibble>    0.525     0.287    1.15e+ 1   2.21     0.276      1 -14.1 
    # 3     6     3     1 <tibble>    1       NaN      NaN        NaN      NaN          1 Inf   
    # 4     8     3     2 <tibble>    0.0262   -0.461    1.74e+ 1   0.0538   0.838      1 -15.7 
    # 5     8     3     4 <tibble>    0.869     0.825    7.48e+ 0  19.9      0.0210     1 -15.9 
    # 6     4     4     2 <tibble>    0.0721   -0.392    3.18e+ 1   0.155    0.732      1 -18.1 
    # 7     8     3     3 <tibble>    0.538     0.0769   2.63e-14   1.17     0.475      1  91.2 
    # 8     4     3     1 <tibble>    0         0      NaN         NA       NA         NA Inf   
    # 9     4     5     2 <tibble>    1       NaN      NaN        NaN      NaN          1 Inf   
    #10     8     5     4 <tibble>    0         0      NaN         NA       NA         NA Inf   
    #11     6     5     6 <tibble>    0         0      NaN         NA       NA         NA Inf   
    #12     8     5     8 <tibble>    0         0      NaN         NA       NA         NA Inf   
    ## … with 5 more variables: AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>,
    ##   nobs <int>, and abbreviated variable names ¹​adj.r.squared, ²​statistic
    ## ℹ Use `colnames()` to see all variable names
    

    Explanation: tidyr::nest nests data in a list column (with name data by default); we can then loop through the data entries, fit the model and extract model summaries with broom::glance in a new column mod; unnesting mod then gives the desired structure. If not needed, you can remove the data column with select(-data).

    PS. The example produces some warnings (leading to NAs in the model summaries) from those groups where you have only a single observation.