So, I'm using Amazon Deequ in spark, and I have a dataframe 'df' with two columns being of type 'Long' or numeric. I simply want to check:
value(column1) lies between value(column2)-20% and value(column2)+20%
for all rows
I'm not sure what check to put here:
val verificationResult: VerificationResult = { VerificationSuite()
.onData(df)
.addCheck(
Check(CheckLevel.Error, "Review Check")
//.funtionToCheckThis()
)
.run()
Check
has a method satisfies
which can take a column expression as condition parameter.
To check whether column1
is between -20%column2
and +20%column2
, you can use expression like:
|column1 - column2| < 0.20*column2
or
column1 between 0.80*column2 and 1.20*column2
:
val verificationResult: VerificationResult = {
VerificationSuite()
.onData(df)
.addCheck(
Check(CheckLevel.Error, "Review Check")
.satisfies(
"abs(column1 - column2) <= 0.20 * column2",
"value(column1) lies between value(column2)-20% and value(column2)+20%"
)
).run()
}