pythondataframedatepython-polars

Filter dataframe by nearest date


I am trying to filter my Polars DataFrame for dates that are nearest to a given date.

For example:

import polars
import datetime

data = {
    "date": ["2025-01-01", "2025-01-01", "2025-01-01", "2026-01-01"],
    "value": [1, 2, 3, 4],
}

df = polars.DataFrame(data).with_columns([polars.col("date").cast(polars.Date)])
shape: (4, 2)
┌────────────┬───────┐
│ date       ┆ value │
│ ---        ┆ ---   │
│ date       ┆ i64   │
╞════════════╪═══════╡
│ 2025-01-01 ┆ 1     │
│ 2025-01-01 ┆ 2     │
│ 2025-01-01 ┆ 3     │
│ 2026-01-01 ┆ 4     │
└────────────┴───────┘

Given a date, say:

date = datetime.date(2024, 12, 31)

I want to filter the DataFrame for rows where the date column only includes records that are closest to my required date.

I know that I can do the following:

result = df.with_columns(
    diff=(polars.col("date") - date).abs()
).filter(
    polars.col("diff") == polars.min("diff")
)
shape: (3, 3)
┌────────────┬───────┬──────────────┐
│ date       ┆ value ┆ diff         │
│ ---        ┆ ---   ┆ ---          │
│ date       ┆ i64   ┆ duration[ms] │
╞════════════╪═══════╪══════════════╡
│ 2025-01-01 ┆ 1     ┆ 1d           │
│ 2025-01-01 ┆ 2     ┆ 1d           │
│ 2025-01-01 ┆ 3     ┆ 1d           │
└────────────┴───────┴──────────────┘

Is there a more succinct way to achieve this (without creating a new column, for example)?


Solution

  • You don't need to add the temporary column, just filter directly:

    df.filter((m:=(pl.col('date')-date).abs()).min() == m)
    

    Or, without the walrus operator:

    diff = (pl.col('date')-date).abs()
    df.filter(diff.min() == diff)
    

    Output:

    ┌────────────┬───────┐
    │ date       ┆ value │
    │ ---        ┆ ---   │
    │ date       ┆ i64   │
    ╞════════════╪═══════╡
    │ 2025-01-01 ┆ 1     │
    │ 2025-01-01 ┆ 2     │
    │ 2025-01-01 ┆ 3     │
    └────────────┴───────┘