rfunctiondplyrcustom-function

Error in custom function with group_by and summarise in dplyr in r


I try to write a custom function with dplyr::group_by and summarise.

but always error, "row.names" length.

I simplify the data and code, and find the reason. here is the data and code:

set.seed(123)
df = data.frame(
  Group = c("A","A","A","A","A","B","B","B","B","B", "B"),
  Con = runif(11, min = 10, max = 60),
  YN = rbinom(11, 1, 0.5)
)

cf = function(x, y){
  df %>%
  group_by(Group) %>%
  summarise(median = median(x),
            n = n(),
            YN_n = sum(y),
            YN_perc = sum(y)/n()*100,
            CI = binom.confint(sum(y), n(),
                               conf.level = 0.95, methods = "exact"))
}

cf(df$Con, df$YN)

I found the error is caused by the different length of Group A and B. If I change Group to 5 A and 5 B, it works.

But my actual data has different values in different groups, how to fix this problem?


Solution

  • You need to put the call to binom.conf in a mutate , and add .groups="drop" to the summarise function. I also added a data argument to the function. (while also acknowledging moodymudskipper's embrace)

    cf = function(data, x, y){
      data %>%
        group_by(Group) %>%
        summarise(median = median({{x}}),
                  n = n(),
                  YN_n = sum({{y}}),
                  YN_perc = YN_n/n*100, .groups="drop") %>%
        mutate(CI = binom.confint(YN_n, n, conf.level = 0.95, methods = "exact"))
    }
    

    require(binom)
    
    cf(df, Con, YN)
    
      Group median     n  YN_n YN_perc CI$method    $x    $n $mean $lower $upper
      <chr>  <dbl> <int> <int>   <dbl> <chr>     <int> <int> <dbl>  <dbl>  <dbl>
    1 A       49.4     5     3      60 exact         3     5   0.6  0.147  0.947
    2 B       37.0     6     3      50 exact         3     6   0.5  0.118  0.882