sqlsql-serverazure-sql-database

select last non zero by SQL


Having large table T of dates, integers between 0 and 2, and isin strings like

date V1 V2 V3 V4 isin
2015-01-01 0 1 0 2 a
2015-01-02 0 1 2 1 b
2015-01-05 1 2 0 1 c
2015-02-01 0 0 0 0 c
2015-02-02 1 2 2 0 b
2015-07-07 0 1 1 0 a

Looking for a query to produce a table of last non-zero values per isin.

Should be

isin V1 V2 V3 V4
c 1 2 NULL 1
b 1 2 2 1
a NULL 1 1 2

I know a way to do it, that looks not efficient to me and is quite slow provided the table size. Can I do it more efficiently and faster?

with 
    isins as (select distinct isin from t),
    a1 as (select max(date) as md, isin from t where v1 > 0 group by isin),
    a2 as (select max(date) as md, isin from t where v2 > 0 group by isin),
    a3 as (select max(date) as md, isin from t where v3 > 0 group by isin),
    a4 as (select max(date) as md, isin from t where v4 > 0 group by isin),
    b1 as (select t.isin, t.V1 from t inner join a1 on a1.isin = t.isin and a1.md = t.date),
    b2 as (select t.isin, t.V2 from t inner join a2 on a2.isin = t.isin and a2.md = t.date),
    b3 as (select t.isin, t.V3 from t inner join a3 on a3.isin = t.isin and a3.md = t.date),
    b4 as (select t.isin, t.V4 from t inner join a4 on a4.isin = t.isin and a4.md = t.date)

select isins.isin, b1.V1, b2.V2, b3.V3, b4.V4 from isins 
    full outer join b1 on isins.isin = b1.isin
    full outer join b2 on isins.isin = b2.isin
    full outer join b3 on isins.isin = b3.isin
    full outer join b4 on isins.isin = b4.isin

Solution

  • A classical way to go through all fields in 3 passes only (regardless of the number of fields) is to use window functions.

    It is not immediately intuitive, because, contrary to aggregate functions (which the expected final result would make the most logical choice), window functions do not reduce the result set to 1 row per isin,
    instead they will add columns to every input row.
    But what is of interest, is that those columns will be computed like aggregates over rows with the same isin.
    This introduces redundancy (each row for ISIN FR0123456789 will hold the same values for those aggregate columns),
    but we will simply use a distinct to reduce that set to 1 row per ISIN.

    Thus by "3 passes" I meant:

    We'll use two additional twists:

    Thus your query would simply be:

    select distinct
        isin,
        first_value(nullif(V1, 0)) ignore nulls over bydate V1,
        first_value(nullif(V2, 0)) ignore nulls over bydate V2,
        first_value(nullif(V3, 0)) ignore nulls over bydate V3,
        first_value(nullif(V4, 0)) ignore nulls over bydate V4
    from t
    window bydate as (partition by isin order by t desc rows between unbounded preceding and unbounded following);
    

    which, given the following input:

    t V1 V2 V3 V4 isin
    2025-01-01 0 1 2 0 abc
    2025-02-01 2 0 2 0 abc
    2025-03-01 1 0 1 0 abc
    2025-04-01 1 2 2 0 abc
    2025-05-01 0 1 0 1 abc
    2025-05-01 1 2 0 1 def

    returns:

    isin V1 V2 V3 V4
    abc 1 1 2 1
    def 1 2 NULL 1

    (can be seen live in a fiddle)