I am trying to figure out what are the best options to perform archive and purge given our situation.
We have roughly 50 million records in say Table A. We want to archive data into a target table and then purge those data in the source table. We would like to retain the data base on several criteria that overlap with each other. For example, we want to retain the data from the past 5 months in addition to keeping all the records with say Indicator='True'. Indicator='True' will likely return records beyond 5 months. This means I have to use OR condition in order to capture the data. Base on the conditions, we would need to retain 10 million records and archive/purge 40 million records. I would need to create a process that will run every 6 months to do this.
My question is, what are the most efficient options for me to get this done for both archiving and purging? Would a PROC/bulk delete/insert be my best option?
Partition seems to be out of the question since there are several conditions that overlap with each other.
Use composite partitioning, e.g. range (for your time dimension) and list (to distinct between the rows that should be kept long and limited time.
Example
The rows with KEEP_ID='N'
should be eliminated after 5 months.
CREATE TABLE tab
( id NUMBER(38,0),
trans_dt DATE,
keep_id VARCHAR2(1)
)
PARTITION BY RANGE (trans_dt) INTERVAL (NUMTOYMINTERVAL(1,'MONTH'))
SUBPARTITION BY LIST (keep_id)
SUBPARTITION TEMPLATE
( SUBPARTITION p_catalog VALUES ('Y'),
SUBPARTITION p_internet VALUES ('N')
)
(PARTITION p_init VALUES LESS THAN (TO_DATE('01-JAN-2019','dd-MON-yyyy'))
);
Populate with sample data for 6 months
insert into tab (id, trans_dt, keep_id)
select rownum, add_months(date'2019-08-01', trunc((rownum-1) / 2)), decode(mod(rownum,2),0,'Y','N')
from dual connect by level <= 12;
select * from tab
order by trans_dt, keep_id;
ID TRANS_DT KEEP_ID
---------- ------------------- -------
1 01.08.2019 00:00:00 N --- this subpartition should be deleted
2 01.08.2019 00:00:00 Y
3 01.09.2019 00:00:00 N
4 01.09.2019 00:00:00 Y
5 01.10.2019 00:00:00 N
6 01.10.2019 00:00:00 Y
7 01.11.2019 00:00:00 N
8 01.11.2019 00:00:00 Y
9 01.12.2019 00:00:00 N
10 01.12.2019 00:00:00 Y
11 01.01.2020 00:00:00 N
12 01.01.2020 00:00:00 Y
Now use partition extended names to reference the subpartition that should be dropped.
Drop subpartition older than 5 months, but only for KEEP_ID = 'N'
alter table tab drop subpartition for (DATE'2019-08-01','N');
New data
ID TRANS_DT KEEP_ID
---------- ------------------- -------
2 01.08.2019 00:00:00 Y
3 01.09.2019 00:00:00 N
4 01.09.2019 00:00:00 Y
.....