[SOLVED] Conditional Expectations contains/like functionality and error (great expectations)

Conditional Expectations contains/like functionality and error (great expectations)

I am trying to add a conditional expectation that checks if the column "Value" is not equal to zero but only for a subset of the dataset where the column "Condition" contains the string "A".

I have two problems

I don't know how to implement the contains/like functionality with the "Condition" column that should contain the string "A"

Even if I use the examples with the equal sign from the internet, I currently get the following error message:

 df.expect_column_values_to_not_be_in_set(

     column='Value',

     value_set=[0],

     row_condition='Condition=="A"',

     result_format = "SUMMARY"

 )

TypeError: expect_column_values_to_not_be_in_set() got an unexpected keyword argument 'row_condition'

(The df is a delta file path converted with the SparkDFDataset function from great_expectations.dataset.sparkdf_dataset import SparkDFDataset)

Thank you very much in advance!

I also tried it with the condition_parser but I got the same error message.

These are the links I used to come up with my code: https://docs.greatexpectations.io/docs/reference/expectations/conditional_expectations/#data-docs-and-conditional-expectations https://legacy.docs.greatexpectations.io/en/latest/reference/conditional_expectations.html

Solution

Try below code according to your data set.

import great_expectations as gx
df = spark.read.format("csv").option("header","true").load("/FileStore/tables/source1_data.csv")
display(df)

enter image description here

pandas_df = df.toPandas()
finalDF = gx.from_pandas(pandas_df)
finalDF.expect_column_values_to_not_be_in_set(
column='level',
value_set=[0],
row_condition='line_code=="D0203"',
condition_parser='pandas',
result_format = "SUMMARY"
)

enter image description here