pythonpandaspython-typingmypypandera

How to annotate a Pandas index of datetime.date values using Pandera and mypy?


I'm using Pandera to define a schema for a pandas DataFrame where the index represents calendar dates (without time). I want to type-annotate the index as holding datetime.date values. Here's what I tried:

# mypy.ini
[mypy]
plugins = pandera.mypy
# schema.py
from datetime import date
import pandera as pa
from pandera.typing import Index

class DateIndexModel(pa.DataFrameModel):
    date: Index[date]

But running mypy gives the following error:

error: Type argument "date" of "Index" must be a subtype of "bool | int | str | float | ExtensionDtype | <30 more items>"  [type-var]
Found 1 error in 1 file (checked 1 source file)

I know that datetime64[ns] or pandas.Timestamp work fine, but I specifically want to model just dates without time. Is there a type-safe way to do this with Pandera and mypy?

Any workaround that lets me enforce date-only index semantics (with or without datetime.date) while keeping mypy happy?

Colab example notebook:
https://colab.research.google.com/drive/1AdiztxHlyvEMo6B3CzYnvzlnh6a0GfUQ?usp=sharing


Solution

  • TL;DR use Index[pa.engines.pandas_engine.Date]

    Pandera as of now does not support datetime.date series data type, but it has a semantic representation of a date type column for each library (pandas, polars, pyarrow etc). Date type for pandas.DataFrames is pa.engines.pandas_engine.Date , for the others you can see the API docs.

    From the pandera documentation:

    class pandera.engines.pandas_engine.Date(to_datetime_kwargs=None)

    Semantic representation of a date data type.

    # schema.py
    import pandera as pa
    from pandera.typing import Index
    
    class DateIndexModel(pa.DataFrameModel):
        date: Index[pa.engines.pandas_engine.Date]