we have a huge db cleanup action coming in and there's going to be around 100 mil rows deleted in total. There are 40 tables to delete data from. Here are my ideas and I'm open to suggestions
#1 approach
Bulk collect and then delete while logging which rows are deleted. Committing every 100 rows
Example->
--define record and needed variables
commit_counter NUMBER := 0;
COMMIT_LIMIT CONSTANT NUMBER := 100;
v_total_deleted_services NUMBER := 0;
TYPE t_record_entity_test IS RECORD (
ENTITY_ID NUMBER,
SOURCE VARCHAR2(100),
SOURCE_ID VARCHAR2(100),
MESSAGE_ID VARCHAR2(100),
STATUS VARCHAR2(200)
);
TYPE t_record_entity_tests IS TABLE OF t_record_entity_test INDEX BY PLS_INTEGER;
v_records_test t_record_entity_tests;
//Make cursor
CURSOR c_services IS
SELECT --all the data needed--
OPEN c_services;
LOOP
FETCH c_services BULK COLLECT INTO v_records_test LIMIT 10000;
EXIT WHEN v_records_test.COUNT = 0;
FORALL i IN 1..v_records_test.COUNT
INSERT INTO DELETE_LOG_TEST(SOURCE, SOURCE_ID, status, log_date)
VALUES (v_records_test(i).SOURCE, v_records_test(i).SOURCE_ID, 'Service DELETED,' || ' Status: ' ||v_records_test(i).status , SYSDATE);
FORALL i IN 1..v_records_test.COUNT
DELETE FROM SERVICE WHERE ENTITY_ID = v_records_test(i).ENTITY_ID;
v_total_deleted_services := v_total_deleted_services + SQL%ROWCOUNT;
commit_counter := commit_counter + v_records_test.COUNT;
IF commit_counter >= COMMIT_LIMIT THEN
COMMIT;
commit_counter := 0;
END IF;
end loop;
close c_services;
commit;
--log number of deleted rows
#2 approach
Bulk collect and log which rows are being deleted. Delete all at once and then commit at the end. Don't know if it's ok since there could be 10 mil rows deleted in one of these actions
--define record and needed variables
v_total_deleted_services NUMBER := 0;
TYPE t_record_entity_test IS RECORD (
ENTITY_ID NUMBER,
SOURCE VARCHAR2(100),
SOURCE_ID VARCHAR2(100),
MESSAGE_ID VARCHAR2(100),
STATUS VARCHAR2(200)
);
TYPE t_record_entity_tests IS TABLE OF t_record_entity_test INDEX BY PLS_INTEGER;
v_records_test t_record_entity_tests;
//Make cursor
CURSOR c_services IS
SELECT --all the data needed--
OPEN c_services;
LOOP
FETCH c_services BULK COLLECT INTO v_records_test LIMIT 10000;
EXIT WHEN v_records_test.COUNT = 0;
FORALL i IN 1..v_records_test.COUNT
INSERT INTO DELETE_LOG_TEST(SOURCE, SOURCE_ID, status, log_date)
VALUES (v_records_test(i).SOURCE, v_records_test(i).SOURCE_ID, 'Service DELETED,' || ' Status: ' ||v_records_test(i).status , SYSDATE);
end loop;
close c_services;
DELETE FROM SERVICE WHERE ENTITY_ID = --select entity_id of data needed to be deleted that is the same data that's in the cursor;
v_total_deleted_services := v_total_deleted_services + SQL%ROWCOUNT;
commit;
--log number of deleted rows
What's a better approach and is there's a third which would be better than these 2?
If you can perform this operation in application downtime and you are deleting a significant portion of a large table, it is far more efficient to create a new segment with the rows you want to keep rather than delete the ones you don't. The most efficient is a CTAS and replace:
CREATE TABLE abc$new PARALLEL (DEGREE 16) AS SELECT * FROM abc WHERE [rows-I-want-to-keep];
ALTER TABLE abc RENAME TO abc$old;
ALTER TABLE abc$new RENAME TO abc;
The downside is you have to also script out and reapply any subordinate objects like indexes, constraints, triggers and grants. But this is the most efficient way (least amount of processing time) to accomplish the task.
A slightly less efficient method but which requires less care in dealing with subordinate objects, uses the same concept but moves the data twice so the original object remains the permanent object:
CREATE TABLE abc$old NOLOGGING PARALLEL (DEGREE 16) AS SELECT * FROM abc;
TRUNCATE TABLE abc;
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ APPEND PARALLEL(abc,16) */ INTO abc SELECT * FROM abc$old WHERE [rows-I-want-to-keep];
COMMIT;
The downside here is that for a time the table is empty, so your app had better be down. Both techniques will result in a table that no longer has the rows you wished to delete, and another table with the original contents in case you need to recover. You can then plan on dropping the abc$old
table at a later time after you are sure the data won't be needed so you can free up the space.
Of course if you must perform these maintenance operation online while the application is using these tables then that requirement will push you back to using a gradual batched delete process of some kind such as you're thinking of. That would be much, much slower but is less invasive.