rdplyrgroup-byfrequencysummarytools

How do I calculate frequency and descriptive statistics using summarytools for all columns in dataframe when applying grouping and weights?


In R, I have a dataframe (mydf) that looks like:

weight gender var1 var2
100 M 1 3
800 F 2 8
10 F 2 9
150 F 4 10

(But with 100 columns (var3, var 4 etc) and 2000 rows).

I want to calculate the weighted frequency and descriptive statistics for each variable ("var" columns) grouped by gender. On the un-grouped data, I have used the summarytools package calculate the frequency and descriptive statistics (freq and descr functions) and that's worked fine. My code was:

## generate descriptive stats and specify weight
mydf_descr <- descr(mydf, weights = mydf$weight)

## generate frequency tables and specify weight
mydf_freq <- freq(mydf, weights = mydf$weight)

However, when I try to apply grouping I'm getting errors. My code is:

mydf_descr_gender <- mydf %>% 
group_by(gender) %>% 
descr(., weights = mydf$weight)

However, I got the error:

Error in descr(x = as_tibble(var_obj)[gr_inds[[g]], ], stats = stats,  : 
  weights vector must have same length as 'x'

And I'm getting the same thing for the freq function.

I also tried:

mdf_freq_gen <- mydf %>%
    group_by(gender) %>%
    summarise_all(~ freq(., weights = weight))

And got the error

Error in `summarise()`:
ℹ In argument: `var1 = (structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
ℹ In group 1: `gender = 1`.
Caused by error in `freq()`:
! weights vector must have same length as 'x'
Run `rlang::last_trace()` to see where the error occurred.

I've tried a bunch of stuff but I can't seem to get it to run the function when grouped and include the weights (it works fine without the weighting). I'm sure I'm missing something obvious!

Any help/ideas would be much appreciated!


Solution

  • You can split the dataset and apply the function to each subset.

    library(dplyr)
    library(summarytools)
    
    mydf %>% 
      split(.$gender) %>% 
      purrr::map(~descr(.x, weights = .x$weight))
    

    The same can also be achieved with group_map.

    mydf %>% 
      group_by(gender) %>% 
      group_map(~descr(.x, weights = .x$weight))