Currently, I am validating the table schema with expect_table_columns_to_match_set
by feeding in a list of columns. However, I want to validate the schema associated with each column such as string
. The only available Great Expectations rule expect_column_values_to_be_of_type
has to be written for each column name and also creates redundancy by repeating the column names.
Is there any rule that I am missing that I can validate both the name and the schema at the same time?
For exmaple, given column a: string, b: int, c: boolean
, I want to pass that whole info into one function instead of having to break it into [a,b,c]
and validating [a],
string` separately for each column.
Ideally, it will be something like expect_column_schmea([(column_name_a, column_type_a), (column_name_b, column_type_b)]
You can use expect_column_values_to_match_json_schema
(or regex / pattern - depending on what you are more comfortable with). Here is the list of expectations that are possible to use.
With expect_column_values_to_match_json_schema
you can define your schema in a json format:
schema = {
"column_name_a": {"type": "string"},
"column_name_b": {"type": "integer"},
"column_name_c": {"type": "boolean"},
}
Create a new ExpectColumnValuesToMatchSchema
instance (import for that was from great_expectations.expectations.core.expect_column_values_to_match_schema import ( ExpectColumnValuesToMatchSchema, )
):
expectation = ExpectColumnValuesToMatchSchema(schema=schema)
And finally validate it to get your results: `result = expectation.validate(dataset)!
You will get a ExpectationSuiteValidationResult
as a return and can accordingly check whether the columns you provided match / do not match the schema!