I want to use the Great Expectations testing suite to run the same validations on many columns. I see that there's a closed feature request to have this as a built-in expectation, but can this be done with a for-loop over the column names?
In addition, I need to filter which columns to test-- I am training various computer vision models on different classes ids, so I need to select all columns corresponding to class ids.
Unfortunately, if you search the docs for filter()
there isn't anything documented, but if you check type(batch)
you see that it's a great_expectations.dataset.pandas_dataset.PandasDataset
, which according to the docs subclasses pandas.DataFrame
.
So, you can filter columns as you would a regular dataframe using batch.filter()
and run a for loop on the columns:
There's a gotcha, though: you can't run the expectations directly on the filtered DataFrame; instead, you have to run the expectations on the original batch
dataset, or else you will get errors when you try to do filtered_df.save_expectation_suite()