scalaamazon-web-servicesamazon-deequ

How to filter rows with column constraint in Deequ ColumnProfileRunner?


I am new to Scala and Spark. I am exploring the Amazon Deequ library for data profiling.

How do I get count of rows having a particular value while using ColumnProfilerRunner()?

The AnalysisRunner has an option of "compliance" I am looking for a similar option to filter rows that comply with the given column constraint.

I have multiple columns hence I want to check dynamically instead of using column names.

Appreciate any help.

Thanks


Solution

  • Deequ's column profiler computes a fixed set of statistics. If you want to compute custom statistics of your data, you should use the VerificationSuite. Checkout the examples on deequ's github page.