I have a starting table that looks like this:
I'm attempting to write a SQL query that will write this data into a similarly formatted table, but with additional adornments indicating when records expired and what records are active. The results would look like this:
identifier | loaddate | loadenddate | activeflag | symbol
723a90699e99ec9e00216910910384bd | 2020-04-01 | 2020-04-07 | 0 | DXB
723a90699e99ec9e00216910910384bd | 2020-04-08 | 2999-12-31 | 1 | DXB CL
Note that there are 1000s of different identifiers, some with one, two, three+ different symbols over various different timeframes.
To load initially (and not duplicate), I have the following SQL:
INSERT INTO finaltable(
listinghashkey
symbol,
loaddate,
loadenddate,
activeflag
)
SELECT
s.listinghashkey
s.symbol,
MAX(s.loaddate),
'2999-12-31 00:00:00.0',
1
FROM
startingtable s
LEFT JOIN finaltable f ON s.listinghashkey = f.listinghashkey
WHERE (f.listinghashkey IS NULL)
GROUP BY s.listinghashkey, s.symbol
Converting your initial format to the new format is pretty easy as a gaps-and-islands problem:
select identifier, symbol, min(loaddate),
nullif(max(loaddate), max_loaddate)
from (select s.*,
max(loaddate) over () as max_loaddate,
row_number() over (partition by identifier order by loaddate) as seqnum,
row_number() over (partition by identifier, symbol order by loaddate) as seqnum_2
from startingtable
) s
group by identifier, symbol, (seqnum - seqnum_2);
This eschews the "is active" flag and the arbitrary future date. It just uses NULL
for the infinite future. (You can easily tweak the logic for your version; this is just simpler to work with.)
If you have this table and you want to add the next load date, then you have can construct the next version in its entirety using union all
. The idea is to divide the processing into four steps:
The SQL looks like:
-- all completed records
select ft.identifier, ft.symbol, ft.loaddate, ft.loadenddate
from finaltable ft
where loadenddate is not null
union all
-- Basically copy over records where the new data is consistent
select ft.identifer, ft.symbol, ft.loaddate, ft.loadenddate
from finaltable ft join
oneload ol
on ft.identifier = ol.identifier and
ft.symbol = ol.symbol
where ft.loadenddate is null
union all
-- close records that are not in the new batch
select ft.identifer, ft.symbol, ft.loaddate, m.loaddate - interval '1 day'
from finaltable ft cross join
(select max(loaddate) as loaddate
from oneload
) m left join
oneload ol
on ft.identifier = ol.identifier and
ft.symbol = ol.symbol
where ft.loadenddate is null
-- finally add new records
select ol.identifer, ol.symbol, ol.loaddate, null
from oneload ol left join
finaltable ft
on ft.identifier = ol.identifier and
ft.symbol = ol.symbol and
ft.loadenddate is null
where ft.identifier is null;
I prefer to do this as a select
/replace operation rather than a series of insert
/update
steps (or using merge
perhaps). But you this is the basic idea.