sqlamazon-redshiftdbtincremental-load

while doing incremental using dbt i want to to aggregation if that row exist else insert


I am using DBT to incremental load data from one schema in redshift to another to create reports. In DBT there is straight forward way to incrementally load data with upsert. But instead of doing the traditional upsert. I want to take sum (on the unique id for the rest of the columns in the table) of the incoming rows and old rows in the destination table if they already exist else do insert them. Say for example I have a table.

T1(userid, total_deposit, total_withdrawal)

i have created a table that calculates total deposit and total withdrawal for a user, when i do an incremental query i might get new deposit or withdrawal the for existing user, in that case, I'll have to add the value in existing table instead of replacing it using upsert. And if the user is new I just need to do simple insert. Any suggestion on how to approach this?


Solution

  • dbt is quite opinionated that invocations of dbt should be idempotent. This means that you can run the same command over and over again, and the result will be the same.

    The operation you're describing is not idempotent, so you're going to have a hard time getting it to work with dbt out of the box.

    As an alternative, I would break this into two steps:

    1. Build an incremental model, where you are appending the new activity
    2. Create a downstream model that references the incremental model and performs the aggregations you want to calculate the balance for each customer. You could very carefully craft this as an incremental model with your user_id as the unique_key (since you have all of the raw transactions in #1), but I'd start without that and make sure that's absolutely necessary for performance reasons, since it will add a fair bit of complexity.

    For more info on complex incremental materializations, I suggest this discourse post written by Tristan Handy, Founder & CEO at dbt Labs