pythongreat-expectations

Check column names and column types in Great Expectations


Currently, I am validating the table schema with expect_table_columns_to_match_set by feeding in a list of columns. However, I want to validate the schema associated with each column such as string. The only available Great Expectations rule expect_column_values_to_be_of_type has to be written for each column name and also creates redundancy by repeating the column names.

Is there any rule that I am missing that I can validate both the name and the schema at the same time?

For exmaple, given column a: string, b: int, c: boolean, I want to pass that whole info into one function instead of having to break it into [a,b,c] and validating [a], string` separately for each column.

Ideally, it will be something like expect_column_schmea([(column_name_a, column_type_a), (column_name_b, column_type_b)]


Solution

  • You can use expect_column_values_to_match_json_schema (or regex / pattern - depending on what you are more comfortable with). Here is the list of expectations that are possible to use.

    With expect_column_values_to_match_json_schema you can define your schema in a json format:

    schema = {
        "column_name_a": {"type": "string"},
        "column_name_b": {"type": "integer"},
        "column_name_c": {"type": "boolean"},
    }
    

    Create a new ExpectColumnValuesToMatchSchema instance (import for that was from great_expectations.expectations.core.expect_column_values_to_match_schema import ( ExpectColumnValuesToMatchSchema, )):

    expectation = ExpectColumnValuesToMatchSchema(schema=schema)

    And finally validate it to get your results: `result = expectation.validate(dataset)!

    You will get a ExpectationSuiteValidationResult as a return and can accordingly check whether the columns you provided match / do not match the schema!