racross

How to use across in an anonymous function in R


I had no problems in getting statistics from df tibble using this script:

library(dplyr)
library(purrr)

set.seed(123)

df <- tibble(
  a = runif(5),
  b = runif(5)
)

funs <- lst(min, median, mean, max, sd)

sum_df1 <- map_dfr(funs,
  ~ summarize(df, across(where(is.numeric), .x, na.rm = TRUE)),
  .id = "statistic"
)

sum_df1

But the way I used across is deprecated. So I tried the following without success:

# Due to deprecation
sum_df2 <- map_dfr(funs,
  ~ summarize(df, across(where(is.numeric), \(x) na.rm = TRUE)),
  .id = "statistic"
)

# Error: only Booleans
sum_df2

Solution

  • Here col refers to the column and .x refers to the function:

    sum_df2 <- map_dfr(funs,
      ~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE))),
      .id = "statistic"
    )
    
    identical(sum_df2, sum_df1)
    ## [1] TRUE
    

    or we can do it the other way around where f is the function and .x is the column.

    sum_df3 <- map_dfr(funs,
      \(f) summarize(df, across(where(is.numeric), ~ f(.x, na.rm = TRUE))),
      .id = "statistic"
    )
    
    identical(sum_df3, sum_df1)
    ## [1] TRUE
    

    or we could avoid using ~ entirely and use this where f is the function and col is the column

    sum_df4 <- map_dfr(funs,
      \(f) summarize(df, across(where(is.numeric), \(col) f(col, na.rm = TRUE))),
      .id = "statistic"
    )
    
    identical(sum_df4, sum_df1)
    ## [1] TRUE
    

    As an aside ?map_dfr indicates that it has been superseded. That means it is not deprecated so it is ok to continue to use it but bind_rows(map(...)) is preferred. If we were to do that then we would redo sum_df2 like this (and analogously for sum_df3 and sum_df4):

    sum_df5 <- map(funs,
         ~ summarize(df, across(where(is.numeric), \(col) .x(col, na.rm = TRUE)))) |>
       bind_rows(.id = "statistic")
    
    identical(sum_df5, sum_df1)
    ## [1] TRUE