google-bigquerydataformelt

Configure Dataform incremental table to do nothing when matched


When creating an incremental table, the behavior is that when there is a matched, defined by "uniqueKey", the other fields will be updated. I would like to, instead, not update when there is a match, and only insert new rows without duplicates. How can I achieve this? An alternative I can think of is to write my own operation, but then I lose the benefit of previewing with the SELECT statement.

config {
    type: "incremental",
    uniqueKey: ["alarm_number"],
    bigquery: {
        partitionBy: "DATE(alarm_time)",
        clusterBy: ["alarm_number", "location", "element"],
        updatePartitionFilter: "time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 4 DAY)"
    }
}

pre_operations {
    DECLARE trigger_checkpoint DEFAULT (
        ${when(incremental(),
        `SELECT MAX(time) FROM ${self()}`,
        `SELECT TIMESTAMP("2020-01-01")`)}
    )
}

SELECT
    *,
    FALSE AS notified -- this column should not be updated when there is a duplicate/merge
FROM ${ref("source")}
AND deviation >= 1.1

Solution

  • The way I have been approaching this case is by verifying myself if the row is already in the destination table or not using the self() reference.

    In your case, I would do something like this:

    SELECT
        s.*,
        FALSE AS notified -- this column should not be updated when there is a duplicate/merge
    FROM ${ref("source")} s
    LEFT JOIN ${self()} se on s.alarm_number = se.alarm_number
    where
        se.alarm_number is null