pythondatedatetimepython-polars

Compare Polars DataFrames That Have a Polars Date Colums


I want to test that two Polars DataFame objects are equivalent, that contain a column which represents dates.

If I use datetime.date from the standard library I don't have any problems:

import datetime as dt

import polars as pl
from polars.testing import assert_frame_equal

assert_frame_equal(pl.DataFrame({"foo": [1], "bar": [dt.date(2000, 1, 1)]}), pl.DataFrame({"foo": [1], "bar": [dt.date(2000, 1, 1)]}))

But if I try to use the Date type from polars the comparison fails, with a PanicException: not implemented exception.

assert_frame_equal(pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]}), pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]}))

Is there a way to use the polars Date type in the DataFrame and still be able to compare the two objects?


Solution

  • I don't think you're supposed to use pl.Date like that, otherwise your DataFrame is of dtype object, which is probably not what you wanted:

    In [2]: pl.DataFrame({"foo": [1], "bar": [pl.Date(2000, 1, 1)]})
    Out[2]:
    shape: (1, 2)
    ┌─────┬─────────────────────────────────────┐
    │ foo ┆ bar                                 │
    │ --- ┆ ---                                 │
    │ i64 ┆ object                              │
    ╞═════╪═════════════════════════════════════╡
    │ 1   ┆ <polars.datatypes.Date object at... │
    └─────┴─────────────────────────────────────┘
    

    Instead, do:

    df1 = pl.DataFrame({"foo": [1], "bar": ['2000-01-01']}).with_columns(pl.col('bar').str.to_date())
    df2 = pl.DataFrame({"foo": [1], "bar": ['2000-01-01']}).with_columns(pl.col('bar').str.to_date())
    
    assert_frame_equal(df1, df2)
    

    and this works fine