pythondatetimepython-polars

Iterate over groups created using groupby on date column


I am new to Polars. I want to iterate over the groups created by grouping over the column where each cell of that column contains a list of two dates. I used the following (sample) piece of code to achieve and it used to work fine with polars==0.20.18 version:

import polars as pl
import datetime

dt_str = [{'ts': datetime.date(2023, 7, 1), 'files': 'AGG_202307.xlsx',
           'period_bins': [datetime.date(2023, 7, 1), datetime.date(2024, 1, 1)]},
          {'ts': datetime.date(2023, 8, 1), 'files': 'AGG_202308.xlsx',
           'period_bins': [datetime.date(2023, 7, 1), datetime.date(2024, 1, 1)]},
          {'ts': datetime.date(2023, 11, 1), 'files': 'KFC_202311.xlsx',
           'period_bins': [datetime.date(2023, 7, 1), datetime.date(2024, 1, 1)]},
          {'ts': datetime.date(2024, 2, 1), 'files': 'KFC_202402.xlsx',
           'period_bins': [datetime.date(2024, 1, 1), datetime.date(2024, 7, 1)]}]

dt = pl.from_dicts(dt_str)

df_groups = dt.group_by("period_bins")
print(df_groups.all().to_dicts())

The above code does not work with polars==1.x and gives the following error:

thread 'polars-0' panicked at crates/polars-row/src/encode.rs:289:15:
not implemented: Date32
thread 'polars-1' panicked at crates/polars-row/src/encode.rs:289:15:
not implemented: Date32
Traceback (most recent call last):
  File "testpad.py", line 18, in <module>
    print(df_groups.all().to_dicts())
  File "python3.10/site-packages/polars/dataframe/group_by.py", line 430, in all
    return self.agg(F.all())
  File "python3.10/site-packages/polars/dataframe/group_by.py", line 228, in agg
    self.df.lazy()
  File "python3.10/site-packages/polars/lazyframe/frame.py", line 2027, in collect
    return wrap_df(ldf.collect(callback))
pyo3_runtime.PanicException: not implemented: Date32

How do I fix this error?


Solution

  • You could group by the .hash() (or cast) as a workaround.

    (df.group_by(pl.col("period_bins").hash().alias("key"))
       .all()
    )
    
    shape: (2, 4)
    ┌─────────────────────┬─────────────────────────────────┬─────────────────────────────────┬─────────────────────────────────┐
    │ key                 ┆ ts                              ┆ files                           ┆ period_bins                     │
    │ ---                 ┆ ---                             ┆ ---                             ┆ ---                             │
    │ u64                 ┆ list[date]                      ┆ list[str]                       ┆ list[list[date]]                │
    ╞═════════════════════╪═════════════════════════════════╪═════════════════════════════════╪═════════════════════════════════╡
    │ 6836989170623494942 ┆ [2023-07-01, 2023-08-01, 2023-… ┆ ["AGG_202307.xlsx", "AGG_20230… ┆ [[2023-07-01, 2024-01-01], [20… │
    │ 2692156858231355433 ┆ [2024-02-01]                    ┆ ["KFC_202402.xlsx"]             ┆ [[2024-01-01, 2024-07-01]]      │
    └─────────────────────┴─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────┘