sqlquery-optimizationvertica

DML operation performance for large table with multiple projections in vertica


I have table with 3billion records, and have 2 projections on the same. Currently delete operation is taking around 3-4 hours in daily loads.

So having multiple projections impacts data loads or DML operations in vertica. Or is there any other better way to tune the delete operation in vertica.

DELETE FROM TABLE1 WHERE EXISTS (SELECT 1 FROM TABLE2 WHERE ID = TABLE1.ID); Table1 have 3b records while Table2 have 50k records.

Projection1 for Table1 have ID column while Projection2 dont have ID column in it.


Solution

  • To be more efficient, the statement should be:

    DELETE FROM table1 WHERE id IN (
      SELECT id FROM table2
    );
    

    But also that can be inefficient if id is not part of all projections of table1