rdplyriqr

Combining across and filter in groups


I'd like to filter just only x1,x2, and x3 values with the distance between the 5th and 95th quantiles by groups (id). But I don't have success in combining across with my variables (x1,x2, and x3), in my example:

library(dplyr)

data <- tibble::tibble(id= paste0(rep("sample_",length(100)),rep(1:10,10)),x1 = rnorm(100),x2 = rnorm(100),x3 = rnorm(100))

data %>%
  group_by(id) %>%
  dplyr::filter(across(x1:x3, function(x) x > quantile(x, 0.05) 
                x < quantile(x, 0.95)))
#Error: Problem with `filter()` input `..1`.
#i Input `..1` is `across(...)`.
#i The error occurred in group 1: id = "sample_1".

Solution

  • Your function will run if you change the code to use & ("AND") between each condition.

    data %>%
      group_by(id) %>%
      dplyr::filter(across(x1:x3, function(x) x > quantile(x, 0.05) & x < quantile(x, 0.95)))
    

    You can also shorten the code with:

    data %>%
      group_by(id) %>%
      filter(across(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))
    

    However, I think filter is intended to be used with either if_all or if_any (introduced in dplyr 1.0.4; see here), depending on whether you want all selected columns or any selected column to fulfill the condition.

    For example:

    data %>%
      group_by(id) %>%
      filter(if_all(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))
    
    data %>%
      group_by(id) %>%
      filter(if_any(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))
    

    In your case, if_all and across give the same results, but I'm not sure if across is guaranteed to always behave the same as if_all.