I have a dataframe, let's call it ex_ds
with variables v1
to v15
> head(ex_ds)
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15
1 5 2014 1 2 4 1 1 8 4 2 2 2 2 2 2
2 5 2014 2 6 1 3 8 <NA> 1 1 2 2 2 1 2
3 5 2014 2 5 2 1 1 8 1 1 1 <NA> <NA> 1 2
4 5 2014 2 2 1 4 1 2 5 2 2 2 2 2 2
5 5 2014 1 5 2 1 8 7 4 1 1 2 2 2 2
6 5 2014 2 4 3 5 3 1 3 2 2 2 2 2 2
I would like to group by each v1, v2
combination, and count the number of non-NA responses for each variable from v3
to v15
.
Here is my custom function:
n_fn_ex = function(var){
n_var = ex_ds %>%
drop_na(var) %>%
group_by(v1,v2) %>%
count()
}
It works for one variable specified individually:
> n_fn_ex("v3")
# A tibble: 1 x 3
# Groups: v1, v2 [1]
v1 v2 n
<int> <int> <int>
1 5 2014 6
> n_fn_ex("v8")
# A tibble: 1 x 3
# Groups: v1, v2 [1]
v1 v2 n
<int> <int> <int>
1 5 2014 5
But it doesn't work in my across statement:
ex_ds_1 = ex_ds %>%
reframe(across(v3:v15, n_fn_ex))
> ex_ds_1 = ex_ds %>%
+ reframe(across(v3:v15, n_fn_ex))
Error in `reframe()`:
i In argument: `across(v3:v15, n_fn_ex)`.
Caused by error in `across()`:
! Can't compute column `v3`.
Caused by error in `drop_na()`:
! Can't subset columns that don't exist.
x Columns `1`, `2`, `2`, `2`, `1`, etc. don't exist.
Run `rlang::last_trace()` to see where the error occurred.
I've also tried the following which result in the same error:
ex_ds_1 = ex_ds %>%
reframe(across("v3":"v15", n_fn_ex))
same with
n_fn_ex_2 = function(var){
n_var = ex_ds %>%
drop_na(var) %>%
group_by(v1,v2) %>%
count()
return(n_var$n)
}
ex_ds_1 = ex_ds %>%
mutate(across("v3":"v15", ~n_fn_ex_2(.)))
I followed the dplyr across page: https://dplyr.tidyverse.org/reference/across.html and another post: Apply Multiple Columns to Custom function Using dplyr::mutate(across())
Example dataset:
ex_ds = structure(list(v1 = c(5L, 5L, 5L, 5L, 5L, 5L), v2 = c(2014L,
2014L, 2014L, 2014L, 2014L, 2014L), v3 = structure(c(1L, 2L,
2L, 2L, 1L, 2L), .Label = c("1", "2"), class = "factor"), v4 = structure(c(2L,
6L, 5L, 2L, 5L, 4L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"),
v5 = structure(c(4L, 1L, 2L, 1L, 2L, 3L), .Label = c("1",
"2", "3", "4"), class = "factor"), v6 = structure(c(1L, 3L,
1L, 4L, 1L, 5L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"),
v7 = structure(c(1L, 8L, 1L, 1L, 8L, 3L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), v8 = structure(c(8L,
NA, 8L, 2L, 7L, 1L), .Label = c("1", "2", "3", "4", "5",
"6", "7", "8"), class = "factor"), v9 = structure(c(4L, 1L,
1L, 5L, 4L, 3L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
v10 = structure(c(2L, 1L, 1L, 2L, 1L, 2L), .Label = c("1",
"2"), class = "factor"), v11 = structure(c(2L, 2L, 1L, 2L,
1L, 2L), .Label = c("1", "2"), class = "factor"), v12 = structure(c(2L,
2L, NA, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
v13 = structure(c(2L, 2L, NA, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor"), v14 = structure(c(2L, 1L, 1L, 2L,
2L, 2L), .Label = c("1", "2"), class = "factor"), v15 = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), row.names = c(NA,
6L), class = "data.frame")
If you don't absolutely need to make a function for this, you can get the desired result with:
ex_ds %>%
group_by(v1, v2) %>%
summarise_at(vars(v3:v15), ~ sum(!is.na(.x)))
As @Onyambu commented, summarise_at(vars(...))
has been deprecated, in favour of summarise(across(...))
. You can rewrite the example above as
ex_ds %>%
group_by(v1, v2) %>%
summarise(across(v3:v15, ~ sum(!is.na(.x))))