rdplyrcustom-functionacross

Applying custom function to many columns in dataframe using across R


I have a dataframe, let's call it ex_ds with variables v1 to v15

> head(ex_ds)
  v1   v2 v3 v4 v5 v6 v7   v8 v9 v10 v11  v12  v13 v14 v15
1  5 2014  1  2  4  1  1    8  4   2   2    2    2   2   2
2  5 2014  2  6  1  3  8 <NA>  1   1   2    2    2   1   2
3  5 2014  2  5  2  1  1    8  1   1   1 <NA> <NA>   1   2
4  5 2014  2  2  1  4  1    2  5   2   2    2    2   2   2
5  5 2014  1  5  2  1  8    7  4   1   1    2    2   2   2
6  5 2014  2  4  3  5  3    1  3   2   2    2    2   2   2

I would like to group by each v1, v2 combination, and count the number of non-NA responses for each variable from v3 to v15.

Here is my custom function:

n_fn_ex = function(var){
  n_var = ex_ds %>% 
    drop_na(var) %>%
    group_by(v1,v2) %>%
    count() 
}

It works for one variable specified individually:

> n_fn_ex("v3")
# A tibble: 1 x 3
# Groups:   v1, v2 [1]
     v1    v2     n
  <int> <int> <int>
1     5  2014     6


> n_fn_ex("v8")
# A tibble: 1 x 3
# Groups:   v1, v2 [1]
     v1    v2     n
  <int> <int> <int>
1     5  2014     5

But it doesn't work in my across statement:

ex_ds_1 = ex_ds %>%
  reframe(across(v3:v15, n_fn_ex))

> ex_ds_1 = ex_ds %>%
+   reframe(across(v3:v15, n_fn_ex))
Error in `reframe()`:
i In argument: `across(v3:v15, n_fn_ex)`.
Caused by error in `across()`:
! Can't compute column `v3`.
Caused by error in `drop_na()`:
! Can't subset columns that don't exist.
x Columns `1`, `2`, `2`, `2`, `1`, etc. don't exist.
Run `rlang::last_trace()` to see where the error occurred.

I've also tried the following which result in the same error:

ex_ds_1 = ex_ds %>%
  reframe(across("v3":"v15", n_fn_ex))

same with

n_fn_ex_2 = function(var){
  n_var = ex_ds %>% 
    drop_na(var) %>%
    group_by(v1,v2) %>%
    count() 
  return(n_var$n)
}

ex_ds_1 = ex_ds %>%
  mutate(across("v3":"v15", ~n_fn_ex_2(.)))


I followed the dplyr across page: https://dplyr.tidyverse.org/reference/across.html and another post: Apply Multiple Columns to Custom function Using dplyr::mutate(across())

Example dataset:

ex_ds = structure(list(v1 = c(5L, 5L, 5L, 5L, 5L, 5L), v2 = c(2014L, 
2014L, 2014L, 2014L, 2014L, 2014L), v3 = structure(c(1L, 2L, 
2L, 2L, 1L, 2L), .Label = c("1", "2"), class = "factor"), v4 = structure(c(2L, 
6L, 5L, 2L, 5L, 4L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"), 
    v5 = structure(c(4L, 1L, 2L, 1L, 2L, 3L), .Label = c("1", 
    "2", "3", "4"), class = "factor"), v6 = structure(c(1L, 3L, 
    1L, 4L, 1L, 5L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"), 
    v7 = structure(c(1L, 8L, 1L, 1L, 8L, 3L), .Label = c("1", 
    "2", "3", "4", "5", "6", "7", "8"), class = "factor"), v8 = structure(c(8L, 
    NA, 8L, 2L, 7L, 1L), .Label = c("1", "2", "3", "4", "5", 
    "6", "7", "8"), class = "factor"), v9 = structure(c(4L, 1L, 
    1L, 5L, 4L, 3L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), 
    v10 = structure(c(2L, 1L, 1L, 2L, 1L, 2L), .Label = c("1", 
    "2"), class = "factor"), v11 = structure(c(2L, 2L, 1L, 2L, 
    1L, 2L), .Label = c("1", "2"), class = "factor"), v12 = structure(c(2L, 
    2L, NA, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), 
    v13 = structure(c(2L, 2L, NA, 2L, 2L, 2L), .Label = c("1", 
    "2"), class = "factor"), v14 = structure(c(2L, 1L, 1L, 2L, 
    2L, 2L), .Label = c("1", "2"), class = "factor"), v15 = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), row.names = c(NA, 
6L), class = "data.frame")

Solution

  • If you don't absolutely need to make a function for this, you can get the desired result with:

    ex_ds %>%
      group_by(v1, v2) %>%
      summarise_at(vars(v3:v15), ~ sum(!is.na(.x)))
    

    Update:

    As @Onyambu commented, summarise_at(vars(...)) has been deprecated, in favour of summarise(across(...)). You can rewrite the example above as

    ex_ds %>%
      group_by(v1, v2) %>%
      summarise(across(v3:v15, ~ sum(!is.na(.x))))