I have two tables:
src:
ID | model | accy | delivery_date | ETA | call_off | department | style | duration | plant |
---|---|---|---|---|---|---|---|---|---|
123abc | xxyy | MM | 2022-12-14T00:00:00.000Z | 2022-10-20T00:00:00.000Z | 2023-01-17T00:00:00.000Z | paint | pink | 3.3 | dd |
123abc | xxyy | MM | 2022-12-14T00:00:00.000Z | 2022-10-20T00:00:00.000Z | 2022-10-20T00:00:00.000Z | paint | pink | 3.3 | dd |
123abc | xxyy | MM | 2022-12-14T00:00:00.000Z | 2022-10-20T00:00:00.000Z | 2023-02-06T00:00:00.000Z | paint | pink | 3.3 | dd |
dest:
ID | model | accy | delivery_date | ETA | call_off | department | style | duration | plant |
---|---|---|---|---|---|---|---|---|---|
123abc | xxyy | MM | 2022-12-14T00:00:00.000Z | 2022-10-20T00:00:00.000Z | 2023-01-17T00:00:00.000Z | paint | pink | 3.3 | dd |
the dest has this row which is the same as the first row in the src table.
whenever I try to merge I get this error:
100090 (42P18): Duplicate row detected during DML action
Row Values: ["123abc", "xxyy", "MM", "2022-12-14T00:00:00.000Z", "2022-10-20T00:00:00.000Z", "2023-01-17T00:00:00.000Z", "paint", "pink", 3.3, "dd"]
My query:
merge into cleaned_delivery dest
using (
SELECT distinct
RECORD_CONTENT:ID::varchar AS ID,
RECORD_CONTENT:model::varchar AS model,
RECORD_CONTENT:accy::varchar AS accy,
RECORD_CONTENT:delivery_date::varchar AS delivery_date,
RECORD_CONTENT:ETA::varchar AS ETA,
RECORD_CONTENT:call_off::varchar AS call_off,
RECORD_CONTENT:department::varchar AS department,
RECORD_CONTENT:style::varchar AS style,
RECORD_CONTENT:duration::float AS duration,
RECORD_CONTENT:plant::varchar AS plant
FROM raw_delivery where RECORD_CONTENT:ID
) as src (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant)
on dest.ID = src.ID and dest.model = src.model and dest.accy = src.accy
when matched
then update set
dest.delivery_date = src.delivery_date,
dest.ETA = src.ETA,
dest.call_off = src.call_off,
dest.department = src.department,
dest.style = src.style,
dest.duration = src.duration
when not matched then insert (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant)
values (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant);
I tried adding this line after the when matched:
when matched and (src.delivery_date != dest.delivery_date or src.ETA!= dest.ETA or src.call_off!= dest.call_off)
But still the same error. How can I merge these two tables and ignore rows that already exists (duplicate)
It is possible to allow nondeterministic MERGE by setting ERROR_ON_NONDETERMINISTIC_MERGE
ALTER SESSION SET ERROR_ON_NONDETERMINISTIC_MERGE = FALSE;
Hiding the fact that source provides multiple matching rows is not recommended course of action. A better way is to explicitly state which row should be taken - unique per columns defined on on dest.ID = src.ID and dest.model = src.model and dest.accy = src.accy
:
merge into cleaned_delivery dest
using (
SELECT
RECORD_CONTENT:ID::varchar AS ID,
RECORD_CONTENT:model::varchar AS model,
RECORD_CONTENT:accy::varchar AS accy,
RECORD_CONTENT:delivery_date::varchar AS delivery_date,
RECORD_CONTENT:ETA::varchar AS ETA,
RECORD_CONTENT:call_off::varchar AS call_off,
RECORD_CONTENT:department::varchar AS department,
RECORD_CONTENT:style::varchar AS style,
RECORD_CONTENT:duration::float AS duration,
RECORD_CONTENT:plant::varchar AS plant
FROM raw_delivery
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID, model, accy
ORDER BY <here_goes_columns>) = 1
) AS src
on dest.ID = src.ID
and dest.model = src.model
and dest.accy = src.accy