I have MERGE query in Oracle DB.
MERGE INTO TABLE_1 T1
USING
(
SELECT DISTINCT SMS_CENTER_ID FROM TABLE_2 T2 WHERE T2.DELIVERY_TIME >= ? AND T2.DELIVERY_TIME < ?) VIEW_1
ON
(T1.SMS_CENTER_ID = VIEW_1.SMS_CENTER_ID AND T1.CREATE_TIME >= ? AND T1.CREATE_TIME < ?)
WHEN MATCHED THEN UPDATE
SET T1.DELETED = 1
Questions:
Can the following change improve performance on large amounts of data? (such that the duration of the query is up to 5 hours)?
MERGE INTO TABLE_1 T1
USING
(
SELECT DISTINCT SMS_CENTER_ID FROM TABLE_2 T2 WHERE T2.DELIVERY_TIME >= ? AND T2.DELIVERY_TIME < ?) VIEW_1
ON
(T1.SMS_CENTER_ID = VIEW_1.SMS_CENTER_ID)
WHEN MATCHED THEN UPDATE
SET T1.DELETED = 1
WHERE T1.CREATE_TIME >= ? AND T1.CREATE_TIME < ?
Or does query performance stay the same?
Can the following change improve performance on large amounts of data? (such that the duration of the query is up to 5 hours)?
[...]
Or does query performance stay the same?
Neither option, moving the filter from the ON
clause to a WHERE
clause in the UPDATE
part of the MERGE
will probably make the performance worse (rather than better or the same).
Given the sample data:
CREATE TABLE table_1 (sms_center_id, create_time, deleted) AS
SELECT 1, DATE '2024-01-01' + LEVEL - 1, 0
FROM DUAL
CONNECT BY DATE '2024-01-01' + LEVEL - 1 < DATE '2025-01-01';
CREATE TABLE table_2 (sms_center_id, delivery_time) AS
SELECT 1, DATE '2024-01-01' FROM DUAL;
(Note: table_1
has 366 rows and table_2
has 1 row.)
Then you can ensure the statistics are up-to-date on both tables:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS( USER, 'TABLE_1');
DBMS_STATS.GATHER_TABLE_STATS( USER, 'TABLE_2');
END;
/
Then trying to update 200 out of the 366 rows:
EXPLAIN PLAN FOR
MERGE INTO TABLE_1 T1
USING (
SELECT DISTINCT SMS_CENTER_ID
FROM TABLE_2 T2
WHERE T2.DELIVERY_TIME >= DATE '2024-01-01'
AND T2.DELIVERY_TIME < DATE '2025-01-01'
) VIEW_1
ON (
T1.SMS_CENTER_ID = VIEW_1.SMS_CENTER_ID
AND T1.CREATE_TIME >= DATE '2024-01-01'
AND T1.CREATE_TIME < DATE '2024-01-01' + 200
)
WHEN MATCHED THEN UPDATE
SET T1.DELETED = 1;
SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
Generates the plan:
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 201 | 402 | 7 (15)| 00:00:01 |
| 1 | MERGE | TABLE_1 | | | | |
| 2 | VIEW | | | | | |
|* 3 | HASH JOIN | | 201 | 5226 | 7 (15)| 00:00:01 |
| 4 | VIEW | | 1 | 13 | 4 (25)| 00:00:01 |
| 5 | SORT UNIQUE | | 1 | 11 | 4 (25)| 00:00:01 |
|* 6 | TABLE ACCESS FULL| TABLE_2 | 1 | 11 | 3 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | TABLE_1 | 201 | 2613 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T1"."SMS_CENTER_ID"="VIEW_1"."SMS_CENTER_ID")
6 - filter("T2"."DELIVERY_TIME">=TO_DATE(' 2024-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "T2"."DELIVERY_TIME"<TO_DATE(' 2025-01-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
7 - filter("T1"."CREATE_TIME"<TO_DATE(' 2024-07-19 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "T1"."CREATE_TIME">=TO_DATE(' 2024-01-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
and:
EXPLAIN PLAN FOR
MERGE INTO TABLE_1 T1
USING (
SELECT DISTINCT SMS_CENTER_ID
FROM TABLE_2 T2
WHERE T2.DELIVERY_TIME >= DATE '2024-01-01'
AND T2.DELIVERY_TIME < DATE '2025-01-01'
) VIEW_1
ON (
T1.SMS_CENTER_ID = VIEW_1.SMS_CENTER_ID
)
WHEN MATCHED THEN UPDATE
SET T1.DELETED = 1
WHERE T1.CREATE_TIME >= DATE '2024-01-01'
AND T1.CREATE_TIME < DATE '2024-01-01' + 200;
SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
Generates the plan:
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 366 | 8052 | 7 (15)| 00:00:01 |
| 1 | MERGE | TABLE_1 | | | | |
| 2 | VIEW | | | | | |
|* 3 | HASH JOIN | | 366 | 9516 | 7 (15)| 00:00:01 |
| 4 | VIEW | | 1 | 13 | 4 (25)| 00:00:01 |
| 5 | SORT UNIQUE | | 1 | 11 | 4 (25)| 00:00:01 |
|* 6 | TABLE ACCESS FULL| TABLE_2 | 1 | 11 | 3 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | TABLE_1 | 366 | 4758 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T1"."SMS_CENTER_ID"="VIEW_1"."SMS_CENTER_ID")
6 - filter("T2"."DELIVERY_TIME">=TO_DATE(' 2024-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND "T2"."DELIVERY_TIME"<TO_DATE(' 2025-01-01
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
The first plan, where the filter is in the ON
clause, has an additional filter and operates on (an estimate of) 201 rows. The second plan, has no additional filter and operates on all 366 rows so it is going to generate a greater volume of undo/redo logs as it works on all possible matches and not just the required subset of those rows.