I'm working on some complex sales analytics, which is very convoluted ... and boring ...
So for this question I'll use a fun, sugary metaphor: vending machines.
But my actual tables are structured the same way.
(You can assume there's plenty of indexing, constraints, etc.)
Let's say we have a table containing vending machine inventory data.
This table simply shows exactly how many, of each type of candy, is currently available in each vending machine.
I know, normally, there would be an ITEM_TYPE
table containing rows for 'Snickers', 'Milky Way' etc. but that isn't how our tables are constructed, for multiple reasons.
In reality, it's not a product count, but instead it's pre-aggregated sales data: "Pipeline Total", "Forecast Total", etc.
So a simple table with separate columns for different "types" of totals, is what I have to work with.
For this example, I also added some text columns, to demonstrate that I have to account for a variety of datatypes.
(This complicates things.)
Except for ID
, all columns are nullable - this is a real issue.
As far as we're concerned, if the column is NULL
, then NULL
is the official value we need to use for analytics and reporting.
CREATE table "VENDING_MACHINES" (
"ID" NUMBER NOT NULL ENABLE,
"SNICKERS_COUNT" NUMBER,
"MILKY_WAY_COUNT" NUMBER,
"TWIX_COUNT" NUMBER,
"SKITTLES_COUNT" NUMBER,
"STARBURST_COUNT" NUMBER,
"SWEDISH_FISH_COUNT" NUMBER,
"FACILITIES_ADDRESS" VARCHAR2(100),
"FACILITIES_CONTACT" VARCHAR2(100),
CONSTRAINT "VENDING_MACHINES_PK" PRIMARY KEY ("ID") USING INDEX ENABLE
)
/
Example data:
INSERT INTO VENDING_MACHINES (ID, SNICKERS_COUNT, MILKY_WAY_COUNT, TWIX_COUNT,
SKITTLES_COUNT, STARBURST_COUNT, SWEDISH_FISH_COUNT,
FACILITIES_ADDRESS, FACILITIES_CONTACT)
SELECT 225, 11, 15, 14, 0, NULL, 13, '123 Abc Street', 'Steve' FROM DUAL UNION ALL
SELECT 349, NULL, 7, 3, 11, 8, 7, NULL, '' FROM DUAL UNION ALL
SELECT 481, 8, 4, 0, NULL, 14, 3, '1920 Tenaytee Way', NULL FROM DUAL UNION ALL
SELECT 576, 4, 2, 8, 4, 9, NULL, '', 'Angela' FROM DUAL
Vending machines will periodically connect to the database and update their inventory records.
Maybe they update every time somebody buys something, or maybe they update every 30 minutes, or maybe they only update when someone refills the candy - honestly it doesn't matter.
What does matter is, whenever a record is updated in the VENDING_MACHINES
table, a trigger is executed which logs every individual change in a separate log table VENDING_MACHINES_CHANGE_LOG
.
This trigger has already been written, and it works great.
(If a column is "updated" with the same value that was already present, the change should be ignored by the trigger.)
A separate row is logged for each column that was modified in the VENDING_MACHINES
table (except for ID
).
Therefore, if a brand new row is inserted in the VENDING_MACHINES
table (i.e. a new vending machine), eight rows will be logged in the VENDING_MACHINES_CHANGE_LOG
table - one for each non-ID column in VENDING_MACHINES
.
(In my real-world scenario, there are 90+ columns being tracked.
But usually only one or two columns are being updated at any given time, so it doesn't get out of hand.)
This "change log" is intended to be a permanent history of the VENDING_MACHINES
table, so we won't create a foreign key constraint - if a row is deleted from VENDING_MACHINES
we want to retain orphaned historical records in the change log.
Also, Apex doesn't support ON UPDATE CASCADE
(?), so the trigger has to check for updates to the ID
column, and manually propagate the update throughout related tables (e.g. the change log).
CREATE table "VENDING_MACHINE_CHANGE_LOG" (
"ID" NUMBER NOT NULL ENABLE,
"CHANGE_TIMESTAMP" TIMESTAMP(6) NOT NULL ENABLE,
"VENDING_MACHINE_ID" NUMBER NOT NULL ENABLE,
"MODIFIED_COLUMN_NAME" VARCHAR2(30) NOT NULL ENABLE,
"MODIFIED_COLUMN_TYPE" VARCHAR2(30) GENERATED ALWAYS AS
(CASE "MODIFIED_COLUMN_NAME" WHEN 'FACILITIES_ADDRESS' THEN 'TEXT'
WHEN 'FACILITIES_CONTACT' THEN 'TEXT'
ELSE 'NUMBER' END) VIRTUAL NOT NULL ENABLE,
"NEW_NUMBER_VALUE" NUMBER,
"NEW_TEXT_VALUE" VARCHAR2(4000),
CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_CK" CHECK
("MODIFIED_COLUMN_NAME" IN('SNICKERS_COUNT', 'MILKY_WAY_COUNT', 'TWIX_COUNT',
'SKITTLES_COUNT', 'STARBURST_COUNT', 'SWEDISH_FISH_COUNT',
'FACILITIES_ADDRESS', 'FACILITIES_CONTACT')) ENABLE,
CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_PK" PRIMARY KEY ("ID") USING INDEX ENABLE,
CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_UK" UNIQUE ("CHANGE_TIMESTAMP",
"VENDING_MACHINE_ID",
"MODIFIED_COLUMN_NAME") USING INDEX ENABLE
/* No foreign key, since we want this change log to be orphaned and preserved.
Also, apparently Apex doesn't support ON UPDATE CASCADE for some reason? */
)
/
Change log example data:
INSERT INTO VENDING_MACHINE_CHANGE_LOG (ID, CHANGE_TIMESTAMP, VENDING_MACHINE_ID,
MODIFIED_COLUMN_NAME, NEW_NUMBER_VALUE, NEW_TEXT_VALUE)
SELECT 167, '11/06/19 05:18', 481, 'MILKY_WAY_COUNT', 5, NULL FROM DUAL UNION ALL
SELECT 168, '11/06/19 05:21', 225, 'SWEDISH_FISH_COUNT', 1, NULL FROM DUAL UNION ALL
SELECT 169, '11/06/19 05:40', 481, 'FACILITIES_ADDRESS', NULL, NULL FROM DUAL UNION ALL
SELECT 170, '11/06/19 05:49', 481, 'STARBURST_COUNT', 4, NULL FROM DUAL UNION ALL
SELECT 171, '11/06/19 06:09', 576, 'FACILITIES_CONTACT', NULL, '' FROM DUAL UNION ALL
SELECT 172, '11/06/19 06:25', 481, 'SWEDISH_FISH_COUNT', 7, NULL FROM DUAL UNION ALL
SELECT 173, '11/06/19 06:40', 481, 'FACILITIES_CONTACT', NULL, 'Audrey' FROM DUAL UNION ALL
SELECT 174, '11/06/19 06:46', 576, 'SNICKERS_COUNT', 13, NULL FROM DUAL UNION ALL
SELECT 175, '11/06/19 06:55', 576, 'FACILITIES_ADDRESS', NULL, '388 Holiday Street' FROM DUAL UNION ALL
SELECT 176, '11/06/19 06:59', 576, 'SWEDISH_FISH_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 177, '11/06/19 07:00', 349, 'MILKY_WAY_COUNT', 3, NULL FROM DUAL UNION ALL
SELECT 178, '11/06/19 07:03', 481, 'TWIX_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 179, '11/06/19 07:11', 349, 'TWIX_COUNT', 15, NULL FROM DUAL UNION ALL
SELECT 180, '11/06/19 07:31', 225, 'FACILITIES_CONTACT', NULL, 'William' FROM DUAL UNION ALL
SELECT 181, '11/06/19 07:49', 576, 'FACILITIES_CONTACT', NULL, 'Brian' FROM DUAL UNION ALL
SELECT 182, '11/06/19 08:28', 481, 'SNICKERS_COUNT', 0, NULL FROM DUAL UNION ALL
SELECT 183, '11/06/19 08:38', 481, 'SKITTLES_COUNT', 7, '' FROM DUAL UNION ALL
SELECT 184, '11/06/19 09:04', 349, 'MILKY_WAY_COUNT', 10, NULL FROM DUAL UNION ALL
SELECT 185, '11/06/19 09:21', 481, 'SNICKERS_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 186, '11/06/19 09:33', 225, 'SKITTLES_COUNT', 11, NULL FROM DUAL UNION ALL
SELECT 187, '11/06/19 09:45', 225, 'FACILITIES_CONTACT', NULL, NULL FROM DUAL UNION ALL
SELECT 188, '11/06/19 10:16', 481, 'FACILITIES_CONTACT', 4, 'Lucy' FROM DUAL UNION ALL
SELECT 189, '11/06/19 10:25', 481, 'SNICKERS_COUNT', 10, NULL FROM DUAL UNION ALL
SELECT 190, '11/06/19 10:57', 576, 'SWEDISH_FISH_COUNT', 12, NULL FROM DUAL UNION ALL
SELECT 191, '11/06/19 10:59', 225, 'MILKY_WAY_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 192, '11/06/19 11:11', 481, 'STARBURST_COUNT', 6, 'Stanley' FROM DUAL UNION ALL
SELECT 193, '11/06/19 11:34', 225, 'SKITTLES_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 194, '11/06/19 11:39', 349, 'FACILITIES_CONTACT', NULL, 'Mark' FROM DUAL UNION ALL
SELECT 195, '11/06/19 11:42', 576, 'SKITTLES_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 196, '11/06/19 11:56', 225, 'TWIX_COUNT', 2, NULL FROM DUAL
I need to build a view that reconstructs the complete historical VENDING_MACHINES
table, using only data from the VENDING_MACHINE_CHANGE_LOG
table.
i.e. Since the change log rows are allowed to be orphaned, rows that have previously been deleted from VENDING_MACHINES
should reappear.
The resulting view should allow me to retreive any VENDING_MACHINE
row, exactly as it existed at any specific point in history.
The example data for VENDING_MACHINE_CHANGE_LOG
is very short, and not quite enough to produce a complete result ...
But it should be enough to demonstrate the desired outcome.
Ultimately I think analytical functions will be required.
But I'm new to SQL analytical functions, and I'm new to Oracle and Apex as well.
So I'm not sure how to approach this - What's the best way to reconstruct the original table rows?
Here's what the desired result should look like (sorted by CHANGE_TIMESTAMP
):
And here is the same desired result, additionally sorted by VENDING_MACHINE_ID
:
I have built a simple query to pull the most recent column values for every VENDING_MACHINE_ID
, but I don't think this method is suitable for this enormous task.
I think I need to use analytical functions instead, to get better performance and flexibility. (Or maybe I'm wrong?)
select vmcl.ID,
vmcl.CHANGE_TIMESTAMP,
vmcl.VENDING_MACHINE_ID,
vmcl.MODIFIED_COLUMN_NAME,
vmcl.MODIFIED_COLUMN_TYPE,
vmcl.NEW_NUMBER_VALUE,
vmcl.NEW_TEXT_VALUE
from ( select sqvmcl.VENDING_MACHINE_ID,
sqvmcl.MODIFIED_COLUMN_NAME,
max(sqvmcl.CHANGE_TIMESTAMP) as LAST_CHANGE_TIMESTAMP
from VENDING_MACHINE_CHANGE_LOG sqvmcl
where sqvmcl.CHANGE_TIMESTAMP <= /*[Current timestamp, or specified timestamp]*/
group by sqvmcl.VENDING_MACHINE_ID, sqvmcl.MODIFIED_COLUMN_NAME ) sq
left join VENDING_MACHINE_CHANGE_LOG vmcl on vmcl.VENDING_MACHINE_ID = sq.VENDING_MACHINE_ID
and vmcl.MODIFIED_COLUMN_NAME = sq.MODIFIED_COLUMN_NAME
and vmcl.CHANGE_TIMESTAMP = sq.LAST_CHANGE_TIMESTAMP
Notice the left join
specifically hits the unique index for the VENDING_MACHINE_CHANGE_LOG
table - this is by design.
I am going to ignore my feeling that this is an "XY problem" and answer this question only:
[How do I] Reconstruct historical table rows, based only on change-log data[?]
(For an approach for what I suspect might be the "real" problem, see this link about Flashback Archives in Oracle 12c: https://docs.oracle.com/database/121/ADFNS/adfns_flashback.htm#ADFNS01004)
For what you've got, I believe this is the query (for your view definition) that you are looking for:
SELECT
c.id change_id,
c.change_timestamp as_of_timestamp,
c.vending_machine_id,
NULLIF(last_value(case when c.modified_column_name = 'SNICKERS_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) snickers_count,
NULLIF(last_value(case when c.modified_column_name = 'MILKY_WAY_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) MILKY_WAY_COUNT,
NULLIF(last_value(case when c.modified_column_name = 'TWIX_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) TWIX_COUNT,
NULLIF(last_value(case when c.modified_column_name = 'SKITTLES_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) SKITTLES_COUNT,
NULLIF(last_value(case when c.modified_column_name = 'STARBURST_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) STARBURST_COUNT,
NULLIF(last_value(case when c.modified_column_name = 'SWEDISH_FISH_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) SWEDISH_FISH_COUNT,
NULLIF(last_value(case when c.modified_column_name = 'FACILITIES_ADDRESS' THEN nvl(c.new_text_value,'#NULL#') ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),'#NULL#') FACILITIES_ADDRESS,
NULLIF(last_value(case when c.modified_column_name = 'FACILITIES_CONTACT' THEN nvl(c.new_text_value,'#NULL#') ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),'#NULL#') FACILITIES_CONTACT
FROM
VENDING_MACHINE_CHANGE_LOG c
ORDER BY
c.vending_machine_id, c.change_timestamp;
Basically, you have three problems:
null
values?Problem #1 is answered by the fact that you are writing the logic for each view column by hand, so it is simple for view definition to use NEW_NUMBER_VALUE
for, say, the SNICKERS_COUNT
column and to use NEW_TEXT_VALUE
for the FACILITIES_ADDRESS
column.
Problem #2 is trickier. Consider the SNICKERS_COUNT
column. You need to ignore changes that are not changes to SNICKERS_COUNT
. It is easy to ignore those by making them null
. However, the actual change values may also be null
, and we don't want to ignore those. So, we must designate a non-null
value to stand in for null
values that we don't want to ignore. This designated value must be one that can never appear in the actual data. For numeric columns, I've chosen -99999 and for text columns, I've chosen '#NULL#'.
Problem #3 I've ignored. The very nature of your question is going to require that you read all the change logs from the beginning of time to build up their values as a given point in time. I don't see you getting off without a full table scan of VENDING_MACHINE_CHANGE_LOG
.
So, let's break down one of the columns in the query to see what it is doing:
nullif(
last_value(
case when c.modified_column_name = 'SNICKERS_COUNT'
THEN nvl(c.new_number_value,-99999)
ELSE NULL END)
ignore nulls
over ( partition by c.vending_machine_id
order by c.change_timestamp asc
range between unbounded preceding and current row)
,-99999) snickers_count,
Start with this inner expression:
case when c.modified_column_name = 'SNICKERS_COUNT'
THEN nvl(c.new_number_value,-99999)
ELSE NULL END
If the modified column is not SNICKERS_COUNT
, the expression is NULL
. That is the only way it can be null. If the new_number_value
was NULL
, we convert it to our designated stand-in (-99999).
Then,
last_value(...case expression above...)
ignore nulls
over ( partition by c.vending_machine_id
order by c.change_timestamp asc
range between unbounded preceding and current row)
... this tells Oracle to take the most recent non-null value for the case expression, with "most recent" being defined as the row with the highest change_timestamp
for the set of rows that have the same vending_machine_id
as the current row and only including changes up to the current row.
And finally,
nullif(... last_value expression above...
,-99999) snickers_count
This converts the designated stand-in value for null
back to a true null
.
Here are the results:
+-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+
| CHANGE_ID | AS_OF_TIMESTAMP | VENDING_MACHINE_ID | SNICKERS_COUNT | MILKY_WAY_COUNT | TWIX_COUNT | SKITTLES_COUNT | STARBURST_COUNT | SWEDISH_FISH_COUNT | FACILITIES_ADDRESS | FACILITIES_CONTACT |
+-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+
| 168 | 06-NOV-19 05.21.00.000000000 AM | 225 | | | | | | 1 | | |
| 180 | 06-NOV-19 07.31.00.000000000 AM | 225 | | | | | | 1 | | William |
| 186 | 06-NOV-19 09.33.00.000000000 AM | 225 | | | | 11 | | 1 | | William |
| 187 | 06-NOV-19 09.45.00.000000000 AM | 225 | | | | 11 | | 1 | | |
| 191 | 06-NOV-19 10.59.00.000000000 AM | 225 | | | | 11 | | 1 | | |
| 193 | 06-NOV-19 11.34.00.000000000 AM | 225 | | | | 8 | | 1 | | |
| 196 | 06-NOV-19 11.56.00.000000000 AM | 225 | | | 2 | 8 | | 1 | | |
| 177 | 06-NOV-19 07.00.00.000000000 AM | 349 | | 3 | | | | | | |
| 179 | 06-NOV-19 07.11.00.000000000 AM | 349 | | 3 | 15 | | | | | |
| 184 | 06-NOV-19 09.04.00.000000000 AM | 349 | | 10 | 15 | | | | | |
| 194 | 06-NOV-19 11.39.00.000000000 AM | 349 | | 10 | 15 | | | | | Mark |
| 167 | 06-NOV-19 05.18.00.000000000 AM | 481 | | 5 | | | | | | |
| 169 | 06-NOV-19 05.40.00.000000000 AM | 481 | | 5 | | | | | | |
| 170 | 06-NOV-19 05.49.00.000000000 AM | 481 | | 5 | | | 4 | | | |
| 172 | 06-NOV-19 06.25.00.000000000 AM | 481 | | 5 | | | 4 | 7 | | |
| 173 | 06-NOV-19 06.40.00.000000000 AM | 481 | | 5 | | | 4 | 7 | | Audrey |
| 178 | 06-NOV-19 07.03.00.000000000 AM | 481 | | 5 | 8 | | 4 | 7 | | Audrey |
| 182 | 06-NOV-19 08.28.00.000000000 AM | 481 | 0 | 5 | 8 | | 4 | 7 | | Audrey |
| 183 | 06-NOV-19 08.38.00.000000000 AM | 481 | 0 | 5 | 8 | 7 | 4 | 7 | | Audrey |
| 185 | 06-NOV-19 09.21.00.000000000 AM | 481 | | 5 | 8 | 7 | 4 | 7 | | Audrey |
| 188 | 06-NOV-19 10.16.00.000000000 AM | 481 | | 5 | 8 | 7 | 4 | 7 | | Lucy |
| 189 | 06-NOV-19 10.25.00.000000000 AM | 481 | 10 | 5 | 8 | 7 | 4 | 7 | | Lucy |
| 192 | 06-NOV-19 11.11.00.000000000 AM | 481 | 10 | 5 | 8 | 7 | 6 | 7 | | Lucy |
| 171 | 06-NOV-19 06.09.00.000000000 AM | 576 | | | | | | | | |
| 174 | 06-NOV-19 06.46.00.000000000 AM | 576 | 13 | | | | | | | |
| 175 | 06-NOV-19 06.55.00.000000000 AM | 576 | 13 | | | | | | 388 Holiday Street | |
| 176 | 06-NOV-19 06.59.00.000000000 AM | 576 | 13 | | | | | | 388 Holiday Street | |
| 181 | 06-NOV-19 07.49.00.000000000 AM | 576 | 13 | | | | | | 388 Holiday Street | Brian |
| 190 | 06-NOV-19 10.57.00.000000000 AM | 576 | 13 | | | | | 12 | 388 Holiday Street | Brian |
| 195 | 06-NOV-19 11.42.00.000000000 AM | 576 | 13 | | | 8 | | 12 | 388 Holiday Street | Brian |
+-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+