Having large table T of dates, integers between 0 and 2, and isin strings like
date | V1 | V2 | V3 | V4 | isin |
---|---|---|---|---|---|
2015-01-01 | 0 | 1 | 0 | 2 | a |
2015-01-02 | 0 | 1 | 2 | 1 | b |
2015-01-05 | 1 | 2 | 0 | 1 | c |
2015-02-01 | 0 | 0 | 0 | 0 | c |
2015-02-02 | 1 | 2 | 2 | 0 | b |
2015-07-07 | 0 | 1 | 1 | 0 | a |
Looking for a query to produce a table of last non-zero values per isin.
Should be
isin | V1 | V2 | V3 | V4 |
---|---|---|---|---|
c | 1 | 2 | NULL | 1 |
b | 1 | 2 | 2 | 1 |
a | NULL | 1 | 1 | 2 |
I know a way to do it, that looks not efficient to me and is quite slow provided the table size. Can I do it more efficiently and faster?
with
isins as (select distinct isin from t),
a1 as (select max(date) as md, isin from t where v1 > 0 group by isin),
a2 as (select max(date) as md, isin from t where v2 > 0 group by isin),
a3 as (select max(date) as md, isin from t where v3 > 0 group by isin),
a4 as (select max(date) as md, isin from t where v4 > 0 group by isin),
b1 as (select t.isin, t.V1 from t inner join a1 on a1.isin = t.isin and a1.md = t.date),
b2 as (select t.isin, t.V2 from t inner join a2 on a2.isin = t.isin and a2.md = t.date),
b3 as (select t.isin, t.V3 from t inner join a3 on a3.isin = t.isin and a3.md = t.date),
b4 as (select t.isin, t.V4 from t inner join a4 on a4.isin = t.isin and a4.md = t.date)
select isins.isin, b1.V1, b2.V2, b3.V3, b4.V4 from isins
full outer join b1 on isins.isin = b1.isin
full outer join b2 on isins.isin = b2.isin
full outer join b3 on isins.isin = b3.isin
full outer join b4 on isins.isin = b4.isin
A classical way to go through all fields in 3 passes only (regardless of the number of fields) is to use window functions.
It is not immediately intuitive, because, contrary to aggregate functions (which the expected final result would make the most logical choice), window functions do not reduce the result set to 1 row per isin
,
instead they will add columns to every input row.
But what is of interest, is that those columns will be computed like aggregates over rows with the same isin
.
This introduces redundancy (each row for ISIN FR0123456789 will hold the same values for those aggregate columns),
but we will simply use a distinct
to reduce that set to 1 row per ISIN.
Thus by "3 passes" I meant:
Vx
among all rows with the same isin
Vx
populating", we explicitely ask the RDBMS to distinct
(so it makes a pass to deduplicate)We'll use two additional twists:
ignore nulls
to tell that the last Vx
value is in fact not to be considered the last if it's null
nullif
to convert 0
s to null
s (because there's no ignore 0s
)Thus your query would simply be:
select distinct
isin,
first_value(nullif(V1, 0)) ignore nulls over bydate V1,
first_value(nullif(V2, 0)) ignore nulls over bydate V2,
first_value(nullif(V3, 0)) ignore nulls over bydate V3,
first_value(nullif(V4, 0)) ignore nulls over bydate V4
from t
window bydate as (partition by isin order by t desc rows between unbounded preceding and unbounded following);
which, given the following input:
t | V1 | V2 | V3 | V4 | isin |
---|---|---|---|---|---|
2025-01-01 | 0 | 1 | 2 | 0 | abc |
2025-02-01 | 2 | 0 | 2 | 0 | abc |
2025-03-01 | 1 | 0 | 1 | 0 | abc |
2025-04-01 | 1 | 2 | 2 | 0 | abc |
2025-05-01 | 0 | 1 | 0 | 1 | abc |
2025-05-01 | 1 | 2 | 0 | 1 | def |
returns:
isin | V1 | V2 | V3 | V4 |
---|---|---|---|---|
abc | 1 | 1 | 2 | 1 |
def | 1 | 2 | NULL | 1 |
(can be seen live in a fiddle)