sqlsql-servergaps-and-islands

Find the Active Item for a Given Date With Time Intervals and Item Criteria


I have a problem that's similar to this one: Merge overlapping time intervals based on priority and type - SQL Server

But it's not quite the same.

I have items that have a type, a weight and an active/inactive status. I have a table that lists when a specifinc instance of a type becomes active or inactive. And I can have multiple items for a given type, each with a different weight.

When given a type and a date, I need to determine the item with the highest weight for that item's type at that given date.

My data looks something like:

Items:


Id | Name | Type | Weight 
-------------------------
1    I1      T1       2
2    I2      T1       1
3    I3      T1       3
4    I4      T2       3
5    I5      T3       3
6    I6      T3       2

We can assume that weight is unique across all items of a given type: There won't be two items of type T1 with weight = 3, for example

Status History:

ItemId | StatusDate | IsActive
------------------------------
1         1/1/2025       1
2         1/15/2025      1
3         2/1/2025       1
4         1/1/2025       1
4         2/1/2025       0
5         1/1/2025       1
6         1/15/2025      1
5         2/1/2025       0

The result should be something like:

Type | Date    | Matching Item Id
---------------------------------
T1      1/2/2025     1
T1      1/20/2025    1  
T1      2/15/2025    3
T2      1/2/2025     4
T2      2/15/2025    NULL (since the only item of this type is inactive at this date)
T3      1/2/2025     5
T3      2/15/2025    6

I tried adapting what I saw in the above post, but it doesn't deal with the deactivation. It seems to me like I want a to create a CTE that has effective Item Ids for a given type and date, something like:

Type | FromDate | ToDate   | ItemId  (NULL ToDate means still active)
-----------------------------------
T1     1/1/2025   2/1/2025    1
T1     2/1/2025    NULL       3
T2     1/1/2025   2/1/2025    4
T3     1/1/2025   2/1/2025    5
T3     2/1/2025    NULL       6

I can construct the status intervals for each Item, but I'm having problems merging these for a given type:

WITH OrderedStatusHistory AS
(
    SElECT SH.Id,
        SH.ItemId,
        I.Name,
        I.Type,
        I.Weight,
        SH.StatusDate,
        SH.IsActive,
        ROW_NUMBER() OVER (PARTITION BY I.Type
                           ORDER BY SH.StatusDate, Weight) RowNum
    FROM StatusHistory SH
    INNER JOIN Items I
        ON SH.ItemId = I.Id
),
StatusHistoryIntervals AS 
(
    SELECT SH1.ItemId,
           SH1.Name,
           SH1.Type,
           SH1.Weight,
           SH1.IsActive,
           SH1.StatusDate FromDate,
           SH2.StatusDate ToDate,
           SH1.RowNum
    FROM OrderedStatusHistory SH1
    LEFT JOIN OrderedStatusHistory SH2
        ON SH1.ItemId = SH2.ItemId
        AND SH1.RowNum = SH2.RowNum-1
)

But I can't get any farther. It seems like it's a gap-and-island problem, but I just can't work it out.
I'm using this information in a JOIN with another table, not sure if it's relevant enough to include that here, except I'll mention that my first attempt involved a subquery in the JOIN to get the item's weight - something like:

SELECT *
FROM MyTable T
INNER JOIN Items I
  ON T.Type = I.Type 
  AND T.Weight = 
    (SELECT MAX(Weight) 
        FROM Item I2
        INNER JOIN StatusHistoryIntervals SH
            ON I2.Id = SH.ItemId
        WHERE I2.Type = SH.Type
            AND SH.IsActive = 1
            AND ((SH.RowNum = 1 AND T.SomeDate < SH.FromDate) -- Assume that the first status is effective for all events prior to that status date
                OR (T.SomeDate BETWEEN SH.FromDate AND SH.ToDate) -- The event falls into the status interval
                OR (T.SomeDate > SH.FromDate AND SH.ToDate IS NULL)))) 

This doesn't work when there are multiple items of the same type with different weights with different, overlapping active status intervals, though - the highest is always returned, even though it's active after SomeDate.

Here's a fiddle with the table creation and population, and the CTEs for the Status History Intervals:

https://sqlfiddle.com/sql-server/online-compiler?id=83ea1514-23ca-4ede-8eaf-8ce6360e8390


Solution

  • You can select the best matching row (or null) using an OUTER APPLY (SELECT TOP 1 ...) operation that:

    1. Filters the candidate matches,
    2. Orders by whatever criteria you define as best, and then
    3. Selects the top 1 matching row.
    ;WITH OrderedStatusHistory AS ...,
    StatusHistoryIntervals AS ...
    SELECT T.*, SH1.ItemId AS MatchingItemId
    FROM MyTable T
    OUTER APPLY (
        SELECT TOP 1 *
        FROM StatusHistoryIntervals SH
        WHERE SH.Type = T.Type
        AND (SH.RowNum = 1 OR SH.FromDate <= T.Date) -- First occurence extends to BOT
        AND (SH.ToDate IS NULL OR SH.ToDate > T.Date) -- Null end date extends to EOT
        AND SH.IsActive = 1
        ORDER BY
            CASE WHEN SH.FromDate <= T.Date THEN 1 ELSE 2 END, -- Prefer actual date match
            SH.Weight DESC
    ) SH1
    

    An OUTER APPLY is like a LEFT JOIN to a subselect with the advantage that the inner query can reference values from the outer query. (A CROSS APPLY is similar, except that it behaves like an INNER JOIN.)

    See this db<>fiddle for a demo.

    Results:

    Type Date MatchingItemId
    T1 2025-01-02 1
    T1 2025-01-20 1
    T1 2025-02-15 3
    T2 2025-01-02 4
    T2 2025-02-15 null
    T3 2025-01-02 5
    T3 2025-02-15 6

    See also How to Join to first row and Top 1 with a left join. In this case we need to use an OUTER APPLY instead of CROSS APPLY to preserve the no-match result.