If I have a a data frame
df = pd.DataFrame({'A': [1.1, 2.2, 3.3], 'B': [4.4, 5.5, 6.6]})
I can use Great Expectations to check the name and dtypes of the columns like so:
import great_expectations as ge
df_asset = ge.from_pandas(df)
# List of expectations
df_asset.expect_column_to_exist('A')
df_asset.expect_column_to_exist('B')
df_asset.expect_column_values_to_be_of_type('A', 'float')
df_asset.expect_column_values_to_be_of_type('B', 'float')
if df_asset.validate()["success"]:
print("Validation passed")
else:
print("Validation failed")
But how can I do a similar thing to check the index of the data frame? I.e. if the data frame was instead
df = pd.DataFrame({'A': [1.1, 2.2, 3.3], 'B': [4.4, 5.5, 6.6]}).set_index('A')
I am looking for something like
df_asset.expect_index_to_exist('idx')
df_asset.expect_index_values_to_be_of_type('idx', 'float')
to replace in the list of expectations
One quick hack is to use .reset_index
to convert the index into a regular column:
import great_expectations as ge
df_asset = ge.from_pandas(df.reset_index())
# List of expectations
df_asset.expect_column_to_exist('A')
df_asset.expect_column_to_exist('B')
df_asset.expect_column_values_to_be_of_type('A', 'float')
df_asset.expect_column_values_to_be_of_type('B', 'float')
# index-related expectations
df_asset.expect_column_to_exist('index')
df_asset.expect_column_values_to_be_of_type('index', 'int')
if df_asset.validate()["success"]:
print("Validation passed")
else:
print("Validation failed")
Note that the default name for an unnamed index is 'index', but you can also control it with kwarg names
(make sure you have pandas>=1.5.0
). Here is an example:
df_asset = ge.from_pandas(df.reset_index(names='custom_index_name'))
This could be useful when you want to avoid clashes with existing column names. This approach can also be used for multiple indexes by providing a tuple of custom names.