scaladataframeapache-sparkamazon-deequ

How to check if values of 'column1' are within +-20% range of values of 'column2' using Amazon Deequ?


So, I'm using Amazon Deequ in spark, and I have a dataframe 'df' with two columns being of type 'Long' or numeric. I simply want to check:

value(column1) lies between value(column2)-20% and value(column2)+20% for all rows

I'm not sure what check to put here:

val verificationResult: VerificationResult = { VerificationSuite()
  .onData(df)
  .addCheck(
    Check(CheckLevel.Error, "Review Check")
      //.funtionToCheckThis()
    )
  .run()

Solution

  • Check has a method satisfies which can take a column expression as condition parameter.

    To check whether column1 is between -20%column2 and +20%column2, you can use expression like:

    |column1 - column2| < 0.20*column2

    or column1 between 0.80*column2 and 1.20*column2:

    val verificationResult: VerificationResult = {
      VerificationSuite()
        .onData(df)
        .addCheck(
          Check(CheckLevel.Error, "Review Check")
            .satisfies(
              "abs(column1 - column2) <= 0.20 * column2",
              "value(column1) lies between value(column2)-20% and value(column2)+20%"
            )
        ).run()
    }