I'm using Pandera to define a schema for a pandas DataFrame where the index represents calendar dates (without time). I want to type-annotate the index as holding datetime.date values. Here's what I tried:
# mypy.ini
[mypy]
plugins = pandera.mypy
# schema.py
from datetime import date
import pandera as pa
from pandera.typing import Index
class DateIndexModel(pa.DataFrameModel):
date: Index[date]
But running mypy
gives the following error:
error: Type argument "date" of "Index" must be a subtype of "bool | int | str | float | ExtensionDtype | <30 more items>" [type-var]
Found 1 error in 1 file (checked 1 source file)
I know that datetime64[ns]
or pandas.Timestamp
work fine, but I specifically want to model just dates without time. Is there a type-safe way to do this with Pandera
and mypy
?
Any workaround that lets me enforce date-only index semantics (with or without datetime.date
) while keeping mypy
happy?
Colab example notebook:
https://colab.research.google.com/drive/1AdiztxHlyvEMo6B3CzYnvzlnh6a0GfUQ?usp=sharing
TL;DR use Index[pa.engines.pandas_engine.Date]
Pandera as of now does not support datetime.date
series data type, but it has a semantic representation of a date type column for each library (pandas, polars, pyarrow etc). Date type for pandas.DataFrame
s is pa.engines.pandas_engine.Date
, for the others you can see the API docs.
From the pandera documentation:
class pandera.engines.pandas_engine.Date(to_datetime_kwargs=None)
Semantic representation of a date data type.
# schema.py
import pandera as pa
from pandera.typing import Index
class DateIndexModel(pa.DataFrameModel):
date: Index[pa.engines.pandas_engine.Date]