Is it possible in some way to force the evaluation order in a {tidypolars}
pipe?
For example:
install.packages("tidypolars", repos = c("https://community.r-multiverse.org", 'https://cloud.r-project.org'))
library(dplyr)
set.seed(123)
lf <- tibble(
id = 1:6,
date = lubridate::now(tzone = "") + lubridate::days(-5:0),
value = stringi::stri_rand_strings(6, 20)
) %>%
tidypolars::as_polars_lf()
This works
date_filter <- lubridate::today(tzone = "") - lubridate::days(3)
lf %>%
filter(date >= date_filter) %>%
collect()
#> id date value
#> 1 3 2025-04-04 11:33:55 8PPM98ESGr2Rn7YC7ktN
#> 2 4 2025-04-05 11:33:55 f5NHoRoonRkdi0TDNbL6
#> 3 5 2025-04-06 11:33:55 FfPm6QztsA8eLeJBm5SV
#> 4 6 2025-04-07 11:33:55 bKUxTtubP9vI3wi8YxaP
This does not
lf %>%
filter(date >= lubridate::today(tzone = "") - lubridate::days(3))
#> Error in `filter()`:
#> ! `tidypolars` doesn't know how to translate this function: `lubridate::today()` (from package `lubridate`).
lf %>%
filter(date >= eval(lubridate::today(tzone = "") - lubridate::days(3)))
#> Error in `filter()`:
#> ! `tidypolars` doesn't know how to translate this function: `eval()`.
lf %>%
filter(date >= force(lubridate::today(tzone = "") - lubridate::days(3)))
#> Error in `filter()`:
#> ! `tidypolars` doesn't know how to translate this function: `force()`.
In the development version of tidypolars
(0.13.0.9000, I'm the author), tidypolars
now works as-is with functions that are not translated but don't use any input from the data itself. This is explained in this section of a vignette but I put it back here in case the link doesn't work anymore in the future.
When a function provided in base R or in another package cannot be translated to the equivalent polars
syntax under the hood, tidypolars
usually throws an error (it’s not necessary to understand what agrep()
does here, you should only know that it is not translated by tidypolars
):
dat <- pl$DataFrame(a = c("d", "e", "f"), foo = c(2, 1, 2))
dat |>
filter(foo >= agrep("a", a))
#> Error in `filter()`:
#> ! `tidypolars` doesn't know how to translate this function: `agrep()`.
This is because foo >= agrep("a", a)
uses the values from the columns a
and foo
, which are stored in the polars
DataFrame. Therefore, polars
needs a translation of agrep()
, which doesn’t exist.
Now, let’s assume that we have this data instead:
dat <- pl$DataFrame(foo = c(2, 1, 2))
a <- c("d", "e", "f")
dat |>
filter(foo >= agrep("a", a))
#> shape: (1, 1)
#> ┌─────┐
#> │ foo │
#> │ --- │
#> │ f64 │
#> ╞═════╡
#> │ 2.0 │
#> └─────┘
Starting from tidypolars
0.14.0, this doesn’t error anymore. Why is that? The reason is that while foo >= agrep("a", a)
uses a column from the data (foo
), agrep("a", a)
no longer does since a
is an object in the environment and not in the data anymore. Therefore, we can evaluate agrep("a", a)
first and then use its result when evaluating foo >= agrep("a", a)
.
More generally, tidypolars
checks whether a function uses columns from the data and skips the translation part if it doesn’t, allowing us to use more functions than what is translated by tidypolars
(only if they don’t use columns as inputs).
Note that expressions that don’t use the data are evaluated before running polars
in the background so they don’t benefit from polars
parallel evaluation for instance.
The example in the original post now works.
lf |>
filter(date >= lubridate::today(tzone = "") - lubridate::days(3))
This is because lubridate::today(tzone = "") - lubridate::days(3)
can be evaluated as-is, without using the data context, so tidypolars
doesn't error anymore if it doesn't find today()
or days()
in the list of translated functions.