pandaspysparkdatabricksazure-databricksgreat-expectations

Conditional Expectations contains/like functionality and error (great expectations)


I am trying to add a conditional expectation that checks if the column "Value" is not equal to zero but only for a subset of the dataset where the column "Condition" contains the string "A".

I have two problems

  1. I don't know how to implement the contains/like functionality with the "Condition" column that should contain the string "A"

  2. Even if I use the examples with the equal sign from the internet, I currently get the following error message:

     df.expect_column_values_to_not_be_in_set(
    
         column='Value',
    
         value_set=[0],
    
         row_condition='Condition=="A"',
    
         result_format = "SUMMARY"
    
     )
    

TypeError: expect_column_values_to_not_be_in_set() got an unexpected keyword argument 'row_condition'

(The df is a delta file path converted with the SparkDFDataset function from great_expectations.dataset.sparkdf_dataset import SparkDFDataset)

Thank you very much in advance!

I also tried it with the condition_parser but I got the same error message.

These are the links I used to come up with my code: https://docs.greatexpectations.io/docs/reference/expectations/conditional_expectations/#data-docs-and-conditional-expectations https://legacy.docs.greatexpectations.io/en/latest/reference/conditional_expectations.html


Solution

  • Try below code according to your data set.

    import great_expectations as gx
    df = spark.read.format("csv").option("header","true").load("/FileStore/tables/source1_data.csv")
    display(df)
    

    enter image description here

    pandas_df = df.toPandas()
    finalDF = gx.from_pandas(pandas_df)
    finalDF.expect_column_values_to_not_be_in_set(
    column='level',
    value_set=[0],
    row_condition='line_code=="D0203"',
    condition_parser='pandas',
    result_format = "SUMMARY"
    )
    

    enter image description here