I have several data quality tests set up in my pipeline and using dbt test to execute them. Currently, using elementary
and dbt_expectations
package, I am also able to load the test results onto snowflake tables. All details (node_id, database, schema, test_name, severity…) is getting logged as records into the table.
But in order to deep dive into data quality issues, I want to identify the records that failed test cases. I currently do not see any package or a macro that can help me to get the rows of failed cases.
Is there an approach that I should follow ?
I would use the parameter store_failures
from the dbt documentation:
https://docs.getdbt.com/reference/data-test-configs
In dbt_project.yml
:
tests:
<resource-path>:
+store_failures: true
You will get all values that put your test in error in a table. Then you will need to write some SQL to find out to which lines they belong.