I am trying to filter my Polars DataFrame for dates that are nearest to a given date.
For example:
import polars
import datetime
data = {
"date": ["2025-01-01", "2025-01-01", "2025-01-01", "2026-01-01"],
"value": [1, 2, 3, 4],
}
df = polars.DataFrame(data).with_columns([polars.col("date").cast(polars.Date)])
shape: (4, 2)
┌────────────┬───────┐
│ date ┆ value │
│ --- ┆ --- │
│ date ┆ i64 │
╞════════════╪═══════╡
│ 2025-01-01 ┆ 1 │
│ 2025-01-01 ┆ 2 │
│ 2025-01-01 ┆ 3 │
│ 2026-01-01 ┆ 4 │
└────────────┴───────┘
Given a date, say:
date = datetime.date(2024, 12, 31)
I want to filter the DataFrame for rows where the date column only includes records that are closest to my required date.
I know that I can do the following:
result = df.with_columns(
diff=(polars.col("date") - date).abs()
).filter(
polars.col("diff") == polars.min("diff")
)
shape: (3, 3)
┌────────────┬───────┬──────────────┐
│ date ┆ value ┆ diff │
│ --- ┆ --- ┆ --- │
│ date ┆ i64 ┆ duration[ms] │
╞════════════╪═══════╪══════════════╡
│ 2025-01-01 ┆ 1 ┆ 1d │
│ 2025-01-01 ┆ 2 ┆ 1d │
│ 2025-01-01 ┆ 3 ┆ 1d │
└────────────┴───────┴──────────────┘
Is there a more succinct way to achieve this (without creating a new column, for example)?
You don't need to add the temporary column, just filter
directly:
df.filter((m:=(pl.col('date')-date).abs()).min() == m)
Or, without the walrus operator:
diff = (pl.col('date')-date).abs()
df.filter(diff.min() == diff)
Output:
┌────────────┬───────┐
│ date ┆ value │
│ --- ┆ --- │
│ date ┆ i64 │
╞════════════╪═══════╡
│ 2025-01-01 ┆ 1 │
│ 2025-01-01 ┆ 2 │
│ 2025-01-01 ┆ 3 │
└────────────┴───────┘