databaseterminologydata-warehouse

What does "incremental load" mean?


I regularly see the expression 'incremental loading' when reading articles

What does is really (technically) mean? What does it implies ?

Explanations using use-cases are welcome.


Solution

  • It generally means only loading into the warehouse the records that have changed (inserts, updates, and deletes if applicable) since the last load; as opposed to doing a full load of all the data (all records, including those that haven't changed since the last load) into the warehouse.

    The advantage is that it reduces the amount of data being transferred from system to system, as a full load may take hours / days to complete depending on volume of data.

    The main disadvantage is around maintainability. With a full load, if there's an error you can re-run the entire load without having to do much else in the way of cleanup / preparation. With an incremental load, the files generally need to be loaded in order. So if you have a problem with one batch, others queue up behind it until you correct it. Alternately you may find an error in a batch from a few days ago, and need to re-load that batch once corrected, followed by every subsequent batch in order to ensure that the data in the warehouse is consistent.