Am attempting to design an Ab Initio load process without any Ab Initio training or documentation. Yeah I know. A design decision is: for the incoming data files there will be inserts and updates. Should I have the feed provider split them into to data files (1 - 10 GB in size nightly) and have Ab Initio do inserts and updates separately?
A problem I see with that, is data isnt always what you expect it to be... And an Insert row may be already present (perhaps purge failed or feed provider made a mistake) Or UPdate row isnt present.
So I'm wondering if I should just combine all inserts and updates... and use Oracle Merge statement (after parallel loading the data into a staging table with no index of course)
But I don't know if AbInitio supports Merge or not.
There is not much for ab initio tutorials or docs on web... can you direct me to anything good?
I would certainly not rely on a source system to tell me whether rows are present in the target table or not. My instinct says to go for a parallel, nologging (if possible), compress (if possible) load into a staging table followed by a merge -- if Ab-Initio does not support Merge then hopefully it supports a call to a PL/SQL procedure, or direct execution of a SQL statement.
If this is a large amount of data I'd like to arrange hash partitioning on the join key for the new and current data sets too.