I try to write a custom function with dplyr::group_by and summarise.
but always error, "row.names" length.
I simplify the data and code, and find the reason. here is the data and code:
set.seed(123)
df = data.frame(
Group = c("A","A","A","A","A","B","B","B","B","B", "B"),
Con = runif(11, min = 10, max = 60),
YN = rbinom(11, 1, 0.5)
)
cf = function(x, y){
df %>%
group_by(Group) %>%
summarise(median = median(x),
n = n(),
YN_n = sum(y),
YN_perc = sum(y)/n()*100,
CI = binom.confint(sum(y), n(),
conf.level = 0.95, methods = "exact"))
}
cf(df$Con, df$YN)
I found the error is caused by the different length of Group A and B. If I change Group to 5 A and 5 B, it works.
But my actual data has different values in different groups, how to fix this problem?
You need to put the call to binom.conf in a mutate
, and add .groups="drop"
to the summarise function. I also added a data argument to the function. (while also acknowledging moodymudskipper's embrace)
cf = function(data, x, y){
data %>%
group_by(Group) %>%
summarise(median = median({{x}}),
n = n(),
YN_n = sum({{y}}),
YN_perc = YN_n/n*100, .groups="drop") %>%
mutate(CI = binom.confint(YN_n, n, conf.level = 0.95, methods = "exact"))
}
require(binom)
cf(df, Con, YN)
Group median n YN_n YN_perc CI$method $x $n $mean $lower $upper
<chr> <dbl> <int> <int> <dbl> <chr> <int> <int> <dbl> <dbl> <dbl>
1 A 49.4 5 3 60 exact 3 5 0.6 0.147 0.947
2 B 37.0 6 3 50 exact 3 6 0.5 0.118 0.882