I'd like to filter
just only x1,x2, and x3 values with the distance between the 5th and 95th quantiles by groups (id
). But I don't have success in combining across
with my variables (x1,x2, and x3), in my example:
library(dplyr)
data <- tibble::tibble(id= paste0(rep("sample_",length(100)),rep(1:10,10)),x1 = rnorm(100),x2 = rnorm(100),x3 = rnorm(100))
data %>%
group_by(id) %>%
dplyr::filter(across(x1:x3, function(x) x > quantile(x, 0.05)
x < quantile(x, 0.95)))
#Error: Problem with `filter()` input `..1`.
#i Input `..1` is `across(...)`.
#i The error occurred in group 1: id = "sample_1".
Your function will run if you change the code to use &
("AND") between each condition.
data %>%
group_by(id) %>%
dplyr::filter(across(x1:x3, function(x) x > quantile(x, 0.05) & x < quantile(x, 0.95)))
You can also shorten the code with:
data %>%
group_by(id) %>%
filter(across(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))
However, I think filter
is intended to be used with either if_all
or if_any
(introduced in dplyr
1.0.4; see here), depending on whether you want all selected columns or any selected column to fulfill the condition.
For example:
data %>%
group_by(id) %>%
filter(if_all(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))
data %>%
group_by(id) %>%
filter(if_any(x1:x3, ~ .x > quantile(.x, 0.05) & .x < quantile(.x, 0.95)))
In your case, if_all
and across
give the same results, but I'm not sure if across
is guaranteed to always behave the same as if_all
.