So, I'm using Amazon Deequ in Spark, and I have a dataframe df
with a column publish_date
which is of type DateType
. I simply want to check the following:
publish_date <= current_date(minus)x AND publish_date >= current_date(minus)y
where x
and y
are integers.
I'm not sure what check to put here:
val verificationResult: VerificationResult = { VerificationSuite()
.onData(df)
.addCheck(
Check(CheckLevel.Error, "Review Check")
//function to check this
)
.run()
}
You can use this Spark SQL expression :
publish_date <= date_sub(current_date(), x) AND publish_date >= date_sub(current_date(), y)
With Check's satisfies method:
val verificationResult: VerificationResult = { VerificationSuite()
.onData(df)
.addCheck(
Check(CheckLevel.Error, "Review Check")
.satisfies(
s"publish_date <= date_sub(current_date(), $x) AND publish_date >= date_sub(current_date(), $y)",
"check constraint name/description"
)
)
.run()
}
Or using between
:
publish_date between date_sub(current_date(), y) and date_sub(current_date(), x)