pandera: 0.18.3
pandas: 2.2.2
python: 3.9/3.11
Hi,
I am unable to setup the pandera for pandas dataframe as it complains:
File "/anaconda/envs/data_quality_env/lib/python3.9/site-packages/pandera/api/base/schema.py",
line 96, in get_backend
raise BackendNotFoundError(
pandera.errors.BackendNotFoundError: Backend not found for backend, class: (<class 'data_validation.schemas.case.CaseSchema'>,
<class 'pandas.core.frame.DataFrame'>). Looked up the following base
classes: (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.generic.NDFrame'>, <class 'pandas.core.base.PandasObject'>, <class 'pandas.core.accessor.DirNamesMixin'>, <class 'pandas.core.indexing.IndexingMixin'>, <class 'pandas.core.arraylike.OpsMixin'>, <class 'object'>)
My folder structure is:
project/
data_validation/
schema/
case.py
validation/
validations.py
pipeline.py
case.py:
import pandas as pd
import pandera as pa
class CaseSchema(pa.DataFrameSchema):
case_id = pa.Column(pa.Int)
validations.py
import pandas as pd
from data_validation.schemas.case import CaseSchema
def validate_case_data(df: pd.DataFrame) -> pd.DataFrame:
"""Validate a DataFrame against the PersonSchema."""
schema = CaseSchema()
return schema.validate(df)
pipeline.py
import pandas as pd
from data_validation.validation.validations import validate_case_data
def validate_df(df: pd.DataFrame) -> pd.DataFrame:
"""Process data, validating it against the PersonSchema."""
validated_df = validate_case_data(df)
return validated_df
df = pd.DataFrame({
"case_id": [1, 2, 3]
})
processed_df = validate_df(df)
This can be solved by including a get_backend
method in CaseSchema
:
import pandas as pd
import pandera as pa
from pandera.backends.pandas.container import DataFrameSchemaBackend
class CaseSchema(pa.DataFrameSchema):
case_id = pa.Column(pa.Int)
@classmethod
def get_backend(cls, check_obj=None, check_type=None):
if check_obj is not None:
check_obj_cls = type(check_obj)
elif check_type is not None:
check_obj_cls = check_type
else:
raise ValueError("Must pass in one of `check_obj` or `check_type`.")
cls.register_default_backends(check_obj_cls)
return DataFrameSchemaBackend()