dbt

How to get the rows that failed the test cases?


I have several data quality tests set up in my pipeline and using dbt test to execute them. Currently, using elementary and dbt_expectations package, I am also able to load the test results onto snowflake tables. All details (node_id, database, schema, test_name, severity…) is getting logged as records into the table.

But in order to deep dive into data quality issues, I want to identify the records that failed test cases. I currently do not see any package or a macro that can help me to get the rows of failed cases.

Is there an approach that I should follow ?


Solution

  • I would use the parameter store_failures from the dbt documentation: https://docs.getdbt.com/reference/data-test-configs

    In dbt_project.yml:

    tests:
      <resource-path>:
        +store_failures: true
    

    You will get all values that put your test in error in a table. Then you will need to write some SQL to find out to which lines they belong.