pythonarraysstructpython-polarsdate-range

Polars: Create a new column with an array of dates from two different date columns storing the start and end of the range


I have a dataframe with a date start and date end (Start Promotion and End Promotion). I would like to use Python Polars to create a new column (from the previously mentioned columns) that will be an array or struct object. The purpose in this is so that the column can be exploded creating a value for each date that can be joined to another data frame later.

This is an example data frame:

Start Promotion End Promotion Price
2024-05-01 2024-05-05 12
2023-12-30 2024-01-01 10

The desired end result is:

Start Promotion End Promotion Price Promotion Dates
2024-05-01 2024-05-05 12 ["2024-05-01", "2024-05-02", "2024-05-03", "2024-05-04", "2024-05-05"]
2023-12-30 2024-01-01 10 ["2023-12-30", "2023-12-31", "2024-01-01"]

Here is an example of what it should look like if exploded:

Start Promotion End Promotion Price Promotion Dates
2024-05-01 2024-05-05 12 "2024-05-01"
2024-05-01 2024-05-05 12 "2024-05-02"
2024-05-01 2024-05-05 12 "2024-05-03"
2024-05-01 2024-05-05 12 "2024-05-04"
2024-05-01 2024-05-05 12 "2024-05-05"
2023-12-30 2024-01-01 10 "2023-12-30"
2023-12-30 2024-01-01 10 "2023-12-31"
2023-12-30 2024-01-01 10 "2024-01-01"

Here is an example data frame to attempt it with:

import polars as pl

data = {
    "Start Promotion": ["2024-05-01", "2023-12-30"],
    "End Promotion": ["2024-05-05", "2024-01-01"],
    "Price": [12, 10]
}

df = pl.DataFrame(data)

Let me know if you have any clarifying questions.


Solution

  • you can use .date_ranges() for that:

    (
        df
        .with_columns(
            pl.date_ranges("Start Promotion", "End Promotion").alias("Promotion Dates")
        )
    )
    
    ┌─────────────────┬───────────────┬───────┬─────────────────────────────────┐
    │ Start Promotion ┆ End Promotion ┆ Price ┆ Promotion Dates                 │
    │ ---             ┆ ---           ┆ ---   ┆ ---                             │
    │ date            ┆ date          ┆ i64   ┆ list[date]                      │
    ╞═════════════════╪═══════════════╪═══════╪═════════════════════════════════╡
    │ 2024-05-01      ┆ 2024-05-05    ┆ 12    ┆ [2024-05-01, 2024-05-02, … 202… │
    │ 2023-12-30      ┆ 2024-01-01    ┆ 10    ┆ [2023-12-30, 2023-12-31, 2024-… │
    └─────────────────┴───────────────┴───────┴─────────────────────────────────┘