sqlsql-serverjoingaps-in-data

SQL Server filling gaps in time series


I'm having trouble with a request in SQL.

I have two tables, one representing a vector of dates and another timeseries of prices for different securities:

Dates:

DateId Date
1 2021-01-01
2 2021-01-02
3 2021-01-03

TimeSerie:

SecurityId DateId Value
1 1 0.25
1 3 0.32
2 1 0.41
2 2 0.67

The timeserie may have gaps, for example in the above table the SecurityId=1 doesn't have any row at DateId=2 and SecurityId=2 doesn't have any rows at DateId=3.

I need to build a request that fill the gaps with the last previous value in the timeserie. Resulting in having a value for all securities and all dates as follow:

SecurityId DateId Value
1 1 0.25
1 2 0.25
1 3 0.32
2 1 0.41
2 2 0.67
2 3 0.67

I tried to start matching all dates with a right join

 SELECT  * from [TimeSerie] px RIGHT JOIN Dates dt on dt.DateId = px.Dateid

Unfortunately this doesn't work as there is always a security for which the equality on DateId matches, so I don't get the TimeSeries rows having no values.

I'm working on SQL Server standard 2019 and I'am aiming to a solution based on a single query (avoiding procedures with temp tables or cursors).


Solution

  • One method would be to use CTEs and a windowed COUNT to put the data into groups and then get the MAX value of that group:

    --Sample data
    WITH Dates AS(
        SELECT *
        FROM (VALUES(1,CONVERT(date,'2021-01-01')),
                    (2,CONVERT(date,'2021-01-02')),
                    (3,CONVERT(date,'2021-01-03')))D(DateID,[Date])),
    TimeSerie AS(
        SELECT *
        FROM (VALUES(1,1,0.25),
                    (1,3,0.32),
                    (2,1,0.41),
                    (2,2,0.67))V(SecurityID,DateID,[Value])),
    --Solution
    DateSeries AS(
        SELECT DISTINCT
               D.DateID,
               D.[Date],
               TS.SecurityID
        FROM Dates D
             CROSS JOIN TimeSerie TS),
    Groups AS(
        SELECT DS.SecurityID,
               DS.DateID,
               TS.[value],
               COUNT(TS.[Value]) OVER (PARTITION BY DS.SecurityID ORDER BY [Date]) AS Grp
        FROM DateSeries DS
             LEFT JOIN TimeSerie TS ON DS.SecurityID = TS.SecurityID
                                   AND DS.DateID = TS.DateID)
    SELECT G.SecurityID,
           G.DateID,
           MAX([Value]) OVER (PARTITION BY G.SecurityID, G.Grp) AS [Value]
    FROM Groups G;