Situation
I'm working on making an inventory history for an application that tracks over 200,000 serialized devices. The goal is to be able to look back in time and know exactly where a device was on X day.
I recognize that the ideal situation would likely be to fire a trigger to update the history when an item changes immediately, however that would be a very large scale task and one that would not be implemented easily with the current application.
With that in mind, we have decided to run a nightly script that checks the current inventory and stores that into an inventory_history
table that tracks the location, status, etc of the inventory. Originally, we tried just plopping in the history every single day. IE inserting 200,000 rows each day which results in over a million records every 5 days. We found that this would result in GBs of data in less than a year. My proposed solution is to implement it in a version control style history. So instead of inserting 200,000 records every day, only insert ones that have changed. (And insert deleted records for ones that were removed.)
Question(s)
Notes
When we are looking at the history, we sometimes want to know a specific device, and other times we want an inventory summary report for that day.
You may want to create two history tables, the first quickly determines what changed from the last time the process was run and the second tracking the history of inventory changes.
The first table is a copy of your inventory table as it existed last time you ran this nightly (daily, hourly, etc.) process. (I would add a datetime field to track when the that process was last run). This table along with your real inventory table determine what has been INSERTED, DELETED or UPDATED by using the Serial Number and relevant fields (location, status, etc).
The second table is the inventory change log (meaning SERIAL XXX has multiple entries). Populate this table whenever inventory changes by copying the record found in the previous table to this table. Add another field: ACTION with values (INSERT, UPDATE, DELETE).
Pseudo code:
To populate inserts and updates
INSERT INTO inventory_transaction (serial_number,lastupdated,ACTION, location, status)
SELECT inventory.serial_number, NOW()
, IF(inventory_history.serial_number IS NULL, 'INSERT', 'UPDATE')
, inventory.location
, inventory.status
FROM inventory
LEFT JOIN inventory_history
ON inventory.serial_number = inventory_history.serial_number
WHERE NOT (inventory.location <=> inventory_history.location
AND inventory.status <=> inventory_history.status);
Deletes:
INSERT INTO inventory_transaction (serial_number, lastupdated, ACTION)
SELECT inventory_history.serial_number, NOW(), 'DELETE'
FROM inventory_history
LEFT JOIN inventory
ON inventory.serial_number = inventory_history.serial_number
WHERE inventory.serial_number IS NULL;
Populate the inventory inventory_history table:
TRUNCATE TABLE inventory_history;
INSERT INTO inventory_history(serial_number,lastchecked, location, status)
SELECT serial_number, NOW(), location, status
FROM inventory;
To see what the inventory looked like at a given point in time:
SELECT inventory_transaction.*
FROM (SELECT serial_number, MAX(lastupdated) as last_date
FROM inventory_transaction
WHERE lastupdated <= 'point in time'
GROUP BY serial_number) AS correct_time
JOIN inventory_transaction
ON inventory_transaction.serial_number = correct_time.serial_number
AND inventory_transaction.lastupdated = correct_time.lastupdated
AND inventory_transaction.ACTION <> 'DELETED'