I am trying to add a conditional expectation that checks if the column "Value" is not equal to zero but only for a subset of the dataset where the column "Condition" contains the string "A".
I have two problems
I don't know how to implement the contains/like functionality with the "Condition" column that should contain the string "A"
Even if I use the examples with the equal sign from the internet, I currently get the following error message:
df.expect_column_values_to_not_be_in_set(
column='Value',
value_set=[0],
row_condition='Condition=="A"',
result_format = "SUMMARY"
)
TypeError: expect_column_values_to_not_be_in_set() got an unexpected keyword argument 'row_condition'
(The df is a delta file path converted with the SparkDFDataset function from great_expectations.dataset.sparkdf_dataset import SparkDFDataset)
Thank you very much in advance!
I also tried it with the condition_parser but I got the same error message.
These are the links I used to come up with my code: https://docs.greatexpectations.io/docs/reference/expectations/conditional_expectations/#data-docs-and-conditional-expectations https://legacy.docs.greatexpectations.io/en/latest/reference/conditional_expectations.html
Try below code according to your data set.
import great_expectations as gx
df = spark.read.format("csv").option("header","true").load("/FileStore/tables/source1_data.csv")
display(df)
pandas_df = df.toPandas()
finalDF = gx.from_pandas(pandas_df)
finalDF.expect_column_values_to_not_be_in_set(
column='level',
value_set=[0],
row_condition='line_code=="D0203"',
condition_parser='pandas',
result_format = "SUMMARY"
)