pythondataframepython-polars

How to select rows between a certain date range in python-polars?


If a DataFrame is constructed like the following using polars-python:

import polars as pl
from polars import col
from datetime import datetime

df = pl.DataFrame({
    "dates": ["2016-07-02", "2016-08-10",  "2016-08-31", "2016-09-10"],
    "values": [1, 2, 3, 4]
})

How to select the rows between a certain date range, i.e. between between "2016-08-10" and "2016-08-31", so that the desired outcome is:

┌────────────┬────────┐
│ dates      ┆ values │
│ ---        ┆ ---    │
│ date       ┆ i64    │
╞════════════╪════════╡
│ 2016-08-10 ┆ 2      │
│ 2016-08-31 ┆ 3      │
└────────────┴────────┘

Solution

  • First you need transform the string values to date types then filter:

    # eager
    (df.with_columns(pl.col("dates").str.to_date()) 
     .filter(col("dates").is_between(datetime(2016, 8, 9), datetime(2016, 9, 1)))
    )
    
    # lazy
    (df.lazy()
     .with_columns(pl.col("dates").str.to_date()) 
     .filter(col("dates").is_between(datetime(2016, 8, 9), datetime(2016, 9, 1)))
     .collect()
    )
    

    both result in the desired output:

    ┌────────────┬────────┐
    │ dates      ┆ values │
    │ ---        ┆ ---    │
    │ date       ┆ i64    │
    ╞════════════╪════════╡
    │ 2016-08-10 ┆ 2      │
    │ 2016-08-31 ┆ 3      │
    └────────────┴────────┘