pythonpandasvalidationgreat-expectations

How to run Great Expectations expectations on multiple columns?


I want to use the Great Expectations testing suite to run the same validations on many columns. I see that there's a closed feature request to have this as a built-in expectation, but can this be done with a for-loop over the column names?

In addition, I need to filter which columns to test-- I am training various computer vision models on different classes ids, so I need to select all columns corresponding to class ids.


Solution

  • Unfortunately, if you search the docs for filter() there isn't anything documented, but if you check type(batch) you see that it's a great_expectations.dataset.pandas_dataset.PandasDataset, which according to the docs subclasses pandas.DataFrame.

    So, you can filter columns as you would a regular dataframe using batch.filter() and run a for loop on the columns:

    Expectations on filtered columns

    There's a gotcha, though: you can't run the expectations directly on the filtered DataFrame; instead, you have to run the expectations on the original batch dataset, or else you will get errors when you try to do filtered_df.save_expectation_suite()

    Expectation results