oracle-databasequery-optimizationunionsql-insert

Pros and cons of `INSERT INTO` versus `UNION ALL`


I am working with an Oracle database in which each year's data is stored in different tables. I am not the DBA so I cannot change that. Also, I do not have the permission to consult execution plans or to create views.

I need to extract data for several years and collect it in a separate table.

The following query with UNION ALL runs in 24.169 secondes:

CREATE TABLE tab_union AS
SELECT * FROM tab2020

UNION ALL
SELECT * FROM tab2021

UNION ALL
SELECT * FROM tab2022;

The following query with INSERT INTO (with NOLOGGING mode and /*+ APPEND */ hint, as discussed in this question) runs in 60.702 secondes

CREATE TABLE tab_insert NOLOGGING AS
SELECT * FROM tab2020;

INSERT /*+ APPEND */ INTO tab_insert
SELECT * FROM tab2021;

INSERT /*+ APPEND */ INTO tab_insert
SELECT * FROM tab2022;

Solution

  • CTAS (create table as select) is the fastest load method because it creates a brand new segment with nothing else to get in the way while it loads the data. Insert with append can perform pretty decently as well, but working with an existing segment always requires more work under the covers than creating a brand new one (consistency, locks, etc..).

    There's less undo allocation, as an insert is DML so it has to be able to undo the changes it made to the segment header. A CTAS (which is a DDL) can simply mark the temporary segment it was creating as non-existent if it fails - it never (fully) entered the dictionary, so cleanup is much easier.

    Also, if you have indexes that insert append has to maintain them, and that can be costly. If you're in PDML mode that will helpfully postpone maintenance and rebuild them at the end, which is great, but a CTAS has by definition no indexes at all so there's no rebuild step or any other kind of index maintenance to bother with - even better.

    And thirdly, you can get parallel threads more easily (using the DEGREE clause) than you can with an insert append since that requires PDML which has number of restrictions that can cause it to not engage.

    But, CTAS means you're modifying the dictionary and that will invalidate objects (like PL/SQL procedures) that have hard dependencies on it when you drop the old one before recreating the new one. So it shouldn't be used extensively without a solid understanding of the impacts. But it's definitely faster.

    To the last question, it's not UNION ALL that is faster, it's doing the whole operation in a single CTAS, rather than a first third via CTAS and the second and third via insert append.