sqloracle-databaseanalyticsoracle-apexanalytical

Reconstruct historical table rows, based only on change-log data


I'm working on some complex sales analytics, which is very convoluted ... and boring ...

So for this question I'll use a fun, sugary metaphor: vending machines.
But my actual tables are structured the same way.
(You can assume there's plenty of indexing, constraints, etc.)

Let's say we have a table containing vending machine inventory data.
This table simply shows exactly how many, of each type of candy, is currently available in each vending machine.

I know, normally, there would be an ITEM_TYPE table containing rows for 'Snickers', 'Milky Way' etc. but that isn't how our tables are constructed, for multiple reasons.
In reality, it's not a product count, but instead it's pre-aggregated sales data: "Pipeline Total", "Forecast Total", etc.
So a simple table with separate columns for different "types" of totals, is what I have to work with.

For this example, I also added some text columns, to demonstrate that I have to account for a variety of datatypes.
(This complicates things.)

Except for ID, all columns are nullable - this is a real issue.
As far as we're concerned, if the column is NULL, then NULL is the official value we need to use for analytics and reporting.

enter image description here

CREATE table "VENDING_MACHINES" (
    "ID"                 NUMBER NOT NULL ENABLE,
    "SNICKERS_COUNT"     NUMBER,
    "MILKY_WAY_COUNT"    NUMBER,
    "TWIX_COUNT"         NUMBER,
    "SKITTLES_COUNT"     NUMBER,
    "STARBURST_COUNT"    NUMBER,
    "SWEDISH_FISH_COUNT" NUMBER,
    "FACILITIES_ADDRESS" VARCHAR2(100),
    "FACILITIES_CONTACT" VARCHAR2(100),

    CONSTRAINT "VENDING_MACHINES_PK" PRIMARY KEY ("ID") USING INDEX ENABLE
)
/

Example data:

INSERT INTO VENDING_MACHINES (ID, SNICKERS_COUNT, MILKY_WAY_COUNT, TWIX_COUNT,
                              SKITTLES_COUNT, STARBURST_COUNT, SWEDISH_FISH_COUNT,
                              FACILITIES_ADDRESS, FACILITIES_CONTACT)
SELECT 225, 11, 15, 14, 0, NULL, 13, '123 Abc Street', 'Steve' FROM DUAL UNION ALL
SELECT 349, NULL, 7, 3, 11, 8, 7, NULL, '' FROM DUAL UNION ALL
SELECT 481, 8, 4, 0, NULL, 14, 3, '1920 Tenaytee Way', NULL FROM DUAL UNION ALL
SELECT 576, 4, 2, 8, 4, 9, NULL, '', 'Angela' FROM DUAL

Vending machines will periodically connect to the database and update their inventory records.
Maybe they update every time somebody buys something, or maybe they update every 30 minutes, or maybe they only update when someone refills the candy - honestly it doesn't matter.

What does matter is, whenever a record is updated in the VENDING_MACHINES table, a trigger is executed which logs every individual change in a separate log table VENDING_MACHINES_CHANGE_LOG.
This trigger has already been written, and it works great.
(If a column is "updated" with the same value that was already present, the change should be ignored by the trigger.)

A separate row is logged for each column that was modified in the VENDING_MACHINES table (except for ID).
Therefore, if a brand new row is inserted in the VENDING_MACHINES table (i.e. a new vending machine), eight rows will be logged in the VENDING_MACHINES_CHANGE_LOG table - one for each non-ID column in VENDING_MACHINES.

(In my real-world scenario, there are 90+ columns being tracked.
But usually only one or two columns are being updated at any given time, so it doesn't get out of hand.)

This "change log" is intended to be a permanent history of the VENDING_MACHINES table, so we won't create a foreign key constraint - if a row is deleted from VENDING_MACHINES we want to retain orphaned historical records in the change log.
Also, Apex doesn't support ON UPDATE CASCADE (?), so the trigger has to check for updates to the ID column, and manually propagate the update throughout related tables (e.g. the change log).

enter image description here

CREATE table "VENDING_MACHINE_CHANGE_LOG" (
    "ID"                   NUMBER       NOT NULL ENABLE,
    "CHANGE_TIMESTAMP"     TIMESTAMP(6) NOT NULL ENABLE,
    "VENDING_MACHINE_ID"   NUMBER       NOT NULL ENABLE,
    "MODIFIED_COLUMN_NAME" VARCHAR2(30) NOT NULL ENABLE,

    "MODIFIED_COLUMN_TYPE" VARCHAR2(30) GENERATED ALWAYS AS
        (CASE "MODIFIED_COLUMN_NAME" WHEN 'FACILITIES_ADDRESS' THEN 'TEXT'
                                     WHEN 'FACILITIES_CONTACT' THEN 'TEXT'
                                     ELSE 'NUMBER' END) VIRTUAL NOT NULL ENABLE,

    "NEW_NUMBER_VALUE"     NUMBER,
    "NEW_TEXT_VALUE"       VARCHAR2(4000),

    CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_CK" CHECK
        ("MODIFIED_COLUMN_NAME" IN('SNICKERS_COUNT', 'MILKY_WAY_COUNT', 'TWIX_COUNT',
                                   'SKITTLES_COUNT', 'STARBURST_COUNT', 'SWEDISH_FISH_COUNT',
                                   'FACILITIES_ADDRESS', 'FACILITIES_CONTACT')) ENABLE,

    CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_PK" PRIMARY KEY ("ID") USING INDEX ENABLE,

    CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_UK" UNIQUE ("CHANGE_TIMESTAMP",
                                                       "VENDING_MACHINE_ID",
                                                       "MODIFIED_COLUMN_NAME") USING INDEX ENABLE

    /* No foreign key, since we want this change log to be orphaned and preserved.
       Also, apparently Apex doesn't support ON UPDATE CASCADE for some reason? */
)
/

Change log example data:

INSERT INTO VENDING_MACHINE_CHANGE_LOG (ID, CHANGE_TIMESTAMP, VENDING_MACHINE_ID,
                                        MODIFIED_COLUMN_NAME, NEW_NUMBER_VALUE, NEW_TEXT_VALUE)
SELECT 167, '11/06/19 05:18', 481, 'MILKY_WAY_COUNT', 5, NULL FROM DUAL UNION ALL
SELECT 168, '11/06/19 05:21', 225, 'SWEDISH_FISH_COUNT', 1, NULL FROM DUAL UNION ALL
SELECT 169, '11/06/19 05:40', 481, 'FACILITIES_ADDRESS', NULL, NULL FROM DUAL UNION ALL
SELECT 170, '11/06/19 05:49', 481, 'STARBURST_COUNT', 4, NULL FROM DUAL UNION ALL
SELECT 171, '11/06/19 06:09', 576, 'FACILITIES_CONTACT', NULL, '' FROM DUAL UNION ALL
SELECT 172, '11/06/19 06:25', 481, 'SWEDISH_FISH_COUNT', 7, NULL FROM DUAL UNION ALL
SELECT 173, '11/06/19 06:40', 481, 'FACILITIES_CONTACT', NULL, 'Audrey' FROM DUAL UNION ALL
SELECT 174, '11/06/19 06:46', 576, 'SNICKERS_COUNT', 13, NULL FROM DUAL UNION ALL
SELECT 175, '11/06/19 06:55', 576, 'FACILITIES_ADDRESS', NULL, '388 Holiday Street' FROM DUAL UNION ALL
SELECT 176, '11/06/19 06:59', 576, 'SWEDISH_FISH_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 177, '11/06/19 07:00', 349, 'MILKY_WAY_COUNT', 3, NULL FROM DUAL UNION ALL
SELECT 178, '11/06/19 07:03', 481, 'TWIX_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 179, '11/06/19 07:11', 349, 'TWIX_COUNT', 15, NULL FROM DUAL UNION ALL
SELECT 180, '11/06/19 07:31', 225, 'FACILITIES_CONTACT', NULL, 'William' FROM DUAL UNION ALL
SELECT 181, '11/06/19 07:49', 576, 'FACILITIES_CONTACT', NULL, 'Brian' FROM DUAL UNION ALL
SELECT 182, '11/06/19 08:28', 481, 'SNICKERS_COUNT', 0, NULL FROM DUAL UNION ALL
SELECT 183, '11/06/19 08:38', 481, 'SKITTLES_COUNT', 7, '' FROM DUAL UNION ALL
SELECT 184, '11/06/19 09:04', 349, 'MILKY_WAY_COUNT', 10, NULL FROM DUAL UNION ALL
SELECT 185, '11/06/19 09:21', 481, 'SNICKERS_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 186, '11/06/19 09:33', 225, 'SKITTLES_COUNT', 11, NULL FROM DUAL UNION ALL
SELECT 187, '11/06/19 09:45', 225, 'FACILITIES_CONTACT', NULL, NULL FROM DUAL UNION ALL
SELECT 188, '11/06/19 10:16', 481, 'FACILITIES_CONTACT', 4, 'Lucy' FROM DUAL UNION ALL
SELECT 189, '11/06/19 10:25', 481, 'SNICKERS_COUNT', 10, NULL FROM DUAL UNION ALL
SELECT 190, '11/06/19 10:57', 576, 'SWEDISH_FISH_COUNT', 12, NULL FROM DUAL UNION ALL
SELECT 191, '11/06/19 10:59', 225, 'MILKY_WAY_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 192, '11/06/19 11:11', 481, 'STARBURST_COUNT', 6, 'Stanley' FROM DUAL UNION ALL
SELECT 193, '11/06/19 11:34', 225, 'SKITTLES_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 194, '11/06/19 11:39', 349, 'FACILITIES_CONTACT', NULL, 'Mark' FROM DUAL UNION ALL
SELECT 195, '11/06/19 11:42', 576, 'SKITTLES_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 196, '11/06/19 11:56', 225, 'TWIX_COUNT', 2, NULL FROM DUAL

I need to build a view that reconstructs the complete historical VENDING_MACHINES table, using only data from the VENDING_MACHINE_CHANGE_LOG table.
i.e. Since the change log rows are allowed to be orphaned, rows that have previously been deleted from VENDING_MACHINES should reappear.
The resulting view should allow me to retreive any VENDING_MACHINE row, exactly as it existed at any specific point in history.

The example data for VENDING_MACHINE_CHANGE_LOG is very short, and not quite enough to produce a complete result ...
But it should be enough to demonstrate the desired outcome.

Ultimately I think analytical functions will be required.
But I'm new to SQL analytical functions, and I'm new to Oracle and Apex as well.
So I'm not sure how to approach this - What's the best way to reconstruct the original table rows?

Here's what the desired result should look like (sorted by CHANGE_TIMESTAMP):

enter image description here

And here is the same desired result, additionally sorted by VENDING_MACHINE_ID:

enter image description here

I have built a simple query to pull the most recent column values for every VENDING_MACHINE_ID, but I don't think this method is suitable for this enormous task.
I think I need to use analytical functions instead, to get better performance and flexibility. (Or maybe I'm wrong?)

select vmcl.ID,
       vmcl.CHANGE_TIMESTAMP,
       vmcl.VENDING_MACHINE_ID,
       vmcl.MODIFIED_COLUMN_NAME,
       vmcl.MODIFIED_COLUMN_TYPE,
       vmcl.NEW_NUMBER_VALUE,
       vmcl.NEW_TEXT_VALUE

from ( select sqvmcl.VENDING_MACHINE_ID,
              sqvmcl.MODIFIED_COLUMN_NAME,
              max(sqvmcl.CHANGE_TIMESTAMP) as LAST_CHANGE_TIMESTAMP
       from VENDING_MACHINE_CHANGE_LOG sqvmcl
       where sqvmcl.CHANGE_TIMESTAMP <= /*[Current timestamp, or specified timestamp]*/
       group by sqvmcl.VENDING_MACHINE_ID, sqvmcl.MODIFIED_COLUMN_NAME ) sq

left join VENDING_MACHINE_CHANGE_LOG vmcl on vmcl.VENDING_MACHINE_ID = sq.VENDING_MACHINE_ID
                                         and vmcl.MODIFIED_COLUMN_NAME = sq.MODIFIED_COLUMN_NAME
                                         and vmcl.CHANGE_TIMESTAMP = sq.LAST_CHANGE_TIMESTAMP

Notice the left join specifically hits the unique index for the VENDING_MACHINE_CHANGE_LOG table - this is by design.


Solution

  • I am going to ignore my feeling that this is an "XY problem" and answer this question only:

    [How do I] Reconstruct historical table rows, based only on change-log data[?]

    (For an approach for what I suspect might be the "real" problem, see this link about Flashback Archives in Oracle 12c: https://docs.oracle.com/database/121/ADFNS/adfns_flashback.htm#ADFNS01004)

    For what you've got, I believe this is the query (for your view definition) that you are looking for:

    SELECT 
        c.id change_id,
        c.change_timestamp as_of_timestamp,
        c.vending_machine_id,
        NULLIF(last_value(case when c.modified_column_name = 'SNICKERS_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) snickers_count,
        NULLIF(last_value(case when c.modified_column_name = 'MILKY_WAY_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) MILKY_WAY_COUNT,
        NULLIF(last_value(case when c.modified_column_name = 'TWIX_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) TWIX_COUNT,
        NULLIF(last_value(case when c.modified_column_name = 'SKITTLES_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) SKITTLES_COUNT,
        NULLIF(last_value(case when c.modified_column_name = 'STARBURST_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) STARBURST_COUNT,
        NULLIF(last_value(case when c.modified_column_name = 'SWEDISH_FISH_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) SWEDISH_FISH_COUNT,
        NULLIF(last_value(case when c.modified_column_name = 'FACILITIES_ADDRESS' THEN nvl(c.new_text_value,'#NULL#') ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),'#NULL#') FACILITIES_ADDRESS,
        NULLIF(last_value(case when c.modified_column_name = 'FACILITIES_CONTACT' THEN nvl(c.new_text_value,'#NULL#') ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),'#NULL#') FACILITIES_CONTACT
    FROM 
        VENDING_MACHINE_CHANGE_LOG c
    ORDER BY 
        c.vending_machine_id, c.change_timestamp;
    

    Basically, you have three problems:

    1. How do you account for different data types that might be stored in each column?
    2. How do you account for null values?
    3. How do you make the query run efficiently?

    Problem #1 is answered by the fact that you are writing the logic for each view column by hand, so it is simple for view definition to use NEW_NUMBER_VALUE for, say, the SNICKERS_COUNT column and to use NEW_TEXT_VALUE for the FACILITIES_ADDRESS column.

    Problem #2 is trickier. Consider the SNICKERS_COUNT column. You need to ignore changes that are not changes to SNICKERS_COUNT. It is easy to ignore those by making them null. However, the actual change values may also be null, and we don't want to ignore those. So, we must designate a non-null value to stand in for null values that we don't want to ignore. This designated value must be one that can never appear in the actual data. For numeric columns, I've chosen -99999 and for text columns, I've chosen '#NULL#'.

    Problem #3 I've ignored. The very nature of your question is going to require that you read all the change logs from the beginning of time to build up their values as a given point in time. I don't see you getting off without a full table scan of VENDING_MACHINE_CHANGE_LOG.

    So, let's break down one of the columns in the query to see what it is doing:

    nullif(
      last_value(
         case when c.modified_column_name = 'SNICKERS_COUNT' 
              THEN nvl(c.new_number_value,-99999) 
              ELSE NULL END) 
      ignore nulls 
      over ( partition by c.vending_machine_id 
             order by c.change_timestamp asc 
             range between unbounded preceding and current row)
     ,-99999) snickers_count,
    

    Start with this inner expression:

    case when c.modified_column_name = 'SNICKERS_COUNT' 
                  THEN nvl(c.new_number_value,-99999) 
                  ELSE NULL END
    

    If the modified column is not SNICKERS_COUNT, the expression is NULL. That is the only way it can be null. If the new_number_value was NULL, we convert it to our designated stand-in (-99999).

    Then,

    last_value(...case expression above...)
      ignore nulls 
      over ( partition by c.vending_machine_id 
             order by c.change_timestamp asc 
             range between unbounded preceding and current row)
    

    ... this tells Oracle to take the most recent non-null value for the case expression, with "most recent" being defined as the row with the highest change_timestamp for the set of rows that have the same vending_machine_id as the current row and only including changes up to the current row.

    And finally,

    nullif(... last_value expression above...
     ,-99999) snickers_count
    

    This converts the designated stand-in value for null back to a true null.

    Here are the results:

    +-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+
    | CHANGE_ID |         AS_OF_TIMESTAMP         | VENDING_MACHINE_ID | SNICKERS_COUNT | MILKY_WAY_COUNT | TWIX_COUNT | SKITTLES_COUNT | STARBURST_COUNT | SWEDISH_FISH_COUNT | FACILITIES_ADDRESS | FACILITIES_CONTACT |
    +-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+
    |       168 | 06-NOV-19 05.21.00.000000000 AM |                225 |                |                 |            |                |                 |                  1 |                    |                    |
    |       180 | 06-NOV-19 07.31.00.000000000 AM |                225 |                |                 |            |                |                 |                  1 |                    | William            |
    |       186 | 06-NOV-19 09.33.00.000000000 AM |                225 |                |                 |            |             11 |                 |                  1 |                    | William            |
    |       187 | 06-NOV-19 09.45.00.000000000 AM |                225 |                |                 |            |             11 |                 |                  1 |                    |                    |
    |       191 | 06-NOV-19 10.59.00.000000000 AM |                225 |                |                 |            |             11 |                 |                  1 |                    |                    |
    |       193 | 06-NOV-19 11.34.00.000000000 AM |                225 |                |                 |            |              8 |                 |                  1 |                    |                    |
    |       196 | 06-NOV-19 11.56.00.000000000 AM |                225 |                |                 |          2 |              8 |                 |                  1 |                    |                    |
    |       177 | 06-NOV-19 07.00.00.000000000 AM |                349 |                |               3 |            |                |                 |                    |                    |                    |
    |       179 | 06-NOV-19 07.11.00.000000000 AM |                349 |                |               3 |         15 |                |                 |                    |                    |                    |
    |       184 | 06-NOV-19 09.04.00.000000000 AM |                349 |                |              10 |         15 |                |                 |                    |                    |                    |
    |       194 | 06-NOV-19 11.39.00.000000000 AM |                349 |                |              10 |         15 |                |                 |                    |                    | Mark               |
    |       167 | 06-NOV-19 05.18.00.000000000 AM |                481 |                |               5 |            |                |                 |                    |                    |                    |
    |       169 | 06-NOV-19 05.40.00.000000000 AM |                481 |                |               5 |            |                |                 |                    |                    |                    |
    |       170 | 06-NOV-19 05.49.00.000000000 AM |                481 |                |               5 |            |                |               4 |                    |                    |                    |
    |       172 | 06-NOV-19 06.25.00.000000000 AM |                481 |                |               5 |            |                |               4 |                  7 |                    |                    |
    |       173 | 06-NOV-19 06.40.00.000000000 AM |                481 |                |               5 |            |                |               4 |                  7 |                    | Audrey             |
    |       178 | 06-NOV-19 07.03.00.000000000 AM |                481 |                |               5 |          8 |                |               4 |                  7 |                    | Audrey             |
    |       182 | 06-NOV-19 08.28.00.000000000 AM |                481 |              0 |               5 |          8 |                |               4 |                  7 |                    | Audrey             |
    |       183 | 06-NOV-19 08.38.00.000000000 AM |                481 |              0 |               5 |          8 |              7 |               4 |                  7 |                    | Audrey             |
    |       185 | 06-NOV-19 09.21.00.000000000 AM |                481 |                |               5 |          8 |              7 |               4 |                  7 |                    | Audrey             |
    |       188 | 06-NOV-19 10.16.00.000000000 AM |                481 |                |               5 |          8 |              7 |               4 |                  7 |                    | Lucy               |
    |       189 | 06-NOV-19 10.25.00.000000000 AM |                481 |             10 |               5 |          8 |              7 |               4 |                  7 |                    | Lucy               |
    |       192 | 06-NOV-19 11.11.00.000000000 AM |                481 |             10 |               5 |          8 |              7 |               6 |                  7 |                    | Lucy               |
    |       171 | 06-NOV-19 06.09.00.000000000 AM |                576 |                |                 |            |                |                 |                    |                    |                    |
    |       174 | 06-NOV-19 06.46.00.000000000 AM |                576 |             13 |                 |            |                |                 |                    |                    |                    |
    |       175 | 06-NOV-19 06.55.00.000000000 AM |                576 |             13 |                 |            |                |                 |                    | 388 Holiday Street |                    |
    |       176 | 06-NOV-19 06.59.00.000000000 AM |                576 |             13 |                 |            |                |                 |                    | 388 Holiday Street |                    |
    |       181 | 06-NOV-19 07.49.00.000000000 AM |                576 |             13 |                 |            |                |                 |                    | 388 Holiday Street | Brian              |
    |       190 | 06-NOV-19 10.57.00.000000000 AM |                576 |             13 |                 |            |                |                 |                 12 | 388 Holiday Street | Brian              |
    |       195 | 06-NOV-19 11.42.00.000000000 AM |                576 |             13 |                 |            |              8 |                 |                 12 | 388 Holiday Street | Brian              |
    +-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+