sqlsql-servertemporal-tablestemporal-database

How to perform a time-sequenced (temporal) LEFT OUTER JOIN


TL;DR In the book Developing Time-Oriented Database Applications in SQL by Richard T. Snodgrass, section 6.3.1 he demonstrates how to perform a sequenced inner join on a transaction-time state table, but does not discuss how to perform a sequenced left outer join.

Initial setup

Suppose the following events occur:

The resulting transaction-time state tables are these:

Note: I have deliberately combined the history and current table, unlike the official SQL server temporal tables feature which has them separated. This is similar to querying temporal tables with FOR SYSTEM_TIME ALL when using the official temporal table feature in SQL Server 2016

Orders

Id UserId ValidFrom ValidTo
0 0 2025-01-01 2025-01-04
0 1 2025-01-04 9999-12-31
1 1 2025-01-01 9999-12-31

Users

Id Name ValidFrom ValidTo
0 John 2025-01-01 2025-01-03
0 Jon 2025-01-03 9999-12-31
1 Mary 2025-01-02 9999-12-31

DDL/DML for Copy Paste

CREATE TABLE Orders (
    Id INT,
    UserId INT ,
    ValidFrom DATE,
    ValidTo DATE,
);

INSERT INTO Orders
VALUES (0, 0, '2025-01-01', '2025-01-04'),
       (0, 1, '2025-01-04', '9999-12-31'),
       (1, 1, '2025-01-01', '9999-12-31');

CREATE TABLE Users (
    Id INT,
    Name VARCHAR(50),
    ValidFrom DATE,
    ValidTo DATE
);

INSERT INTO Users
VALUES (0, 'John', '2025-01-01', '2025-01-03'),
       (0, 'Jon', '2025-01-03', '9999-12-31'),
       (1, 'Mary', '2025-01-02', '9999-12-31');

Now, I can use the following query to get all the current orders, and the associated user name:

SELECT O.Id, U.Name 
FROM Orders O
LEFT JOIN Users U ON U.Id = O.UserId
WHERE O.ValidTo = '9999-12-31' 
  AND U.ValidTo = '9999-12-31'

Results

O.Id U.Name
0 Mary
1 Mary

Question

I want to see how the results of the above query have changed over time. What query should I write to get the results shown below?

Desired Results

O.Id U.Name ValidFrom ValidTo Note (for reference, not actually part of the query)
0 John 2025-01-01 2025-01-03 User 0 + Order 0 are created
1 NULL 2025-01-01 2025-01-02 Order 1 is created
1 Mary 2025-01-02 9999-12-31 User 1 is created
0 Jon 2025-01-03 2025-01-04 User 0 is renamed
0 Mary 2025-01-04 9999-12-31 Order 0 is moved from User 0 to User 1

Solution

  • As coded in your attempt, the LEFT JOIN is not working as intended, because once it matches a row (having an overlapping date range), it will not also supply a null match for the non-overlapping case.

    One solution would be to use a CTE (common table expression) to generate additional User rows for the since beginning-of-time cases and UNION them with the actual source rows. If you ever have user IDs having no rows extending to the end-of-time, you might also need to generate rows for those cases.

    A standard test for overlapping date ranges (having exclusive end dates) is Start1 < End2 AND Start2 < End1. This can be included in your join condition. The resulting overlap date range can then be calculated as GREATEST(start1, start2) and LEAST(end1, end2).

    The following query should suit your needs:

    WITH ExtendedUsers AS (
        SELECT Id, Name, ValidFrom, ValidTo
        FROM Users
        UNION
        -- Beginning of time extensions
        SELECT Id, NULL, '1900-01-01' AS ValidFrom, MIN(ValidFrom) AS ValidTo
        FROM Users
        GROUP BY Id
    )
    SELECT
        O.Id, U.Name,
        GREATEST(O.ValidFrom, U.ValidFrom) ValidFrom,
        LEAST(O.ValidTo, U.ValidTo) ValidTo
    FROM Orders O
    LEFT JOIN ExtendedUsers U
        ON U.Id = O.UserId
        AND U.ValidFrom < O.ValidTo
        AND O.ValidFrom < U.ValidTo
    ORDER BY ValidFrom, O.Id;
    

    The LEFT JOIN will still handle the case where there are no matching User rows. If this situation never occurs, the query can be changed to use an inner join.

    Results:

    Id Name ValidFrom ValidTo
    0 John 2025-01-01 2025-01-03
    1 null 2025-01-01 2025-01-02
    1 Mary 2025-01-02 9999-12-31
    0 Jon 2025-01-03 2025-01-04
    0 Mary 2025-01-04 9999-12-31

    See this db<>query for a demo.

    If you need to generate activity notes (and they weren't just included in your post for reference), you can:

    1. Use the OUTER APPLY(SELECT TOP 1 ...) pattern to select the prior active Order and User rows,
    2. Generate notes based on various tests,
    3. Combine those notes into a single value using STRING_AGG(), and
    4. Add the combined notes to your final select list.
    WITH ...
    SELECT ..., N.Notes
    FROM ...
    OUTER APPLY (
        SELECT TOP 1 O2.*
        FROM Orders O2
        WHERE O2.Id = O.Id
        AND O2.ValidFrom < GREATEST(O.ValidFrom, U.ValidFrom)
        ORDER BY O2.ValidFrom DESC
    ) OPrior
    OUTER APPLY (
        SELECT TOP 1 U2.*
        FROM Users U2
        WHERE U2.Id = U.Id
        AND U2.ValidFrom < GREATEST(O.ValidFrom, U.ValidFrom)
        ORDER BY U2.ValidFrom DESC
    ) UPrior
    CROSS APPLY (
        SELECT STRING_AGG(N1.Note, ', ') WITHIN GROUP(ORDER BY Seq) AS Notes
        FROM (
            SELECT 'Order Created' AS Note, 1 AS Seq
            WHERE OPrior.Id IS NULL
            UNION ALL
            SELECT 'User Reassigned' AS Note, 2 AS Seq
            WHERE OPrior.UserId <> O.UserId
            UNION ALL
            SELECT 'User Created' AS Note, 3 AS Seq
            WHERE UPrior.Id IS NULL
            AND U.ValidFrom > '1900-01-01'
            UNION ALL
            SELECT 'User Renamed' AS Note, 4 AS Seq
            WHERE UPrior.Id = U.Id
            AND UPrior.Name <> U.Name
        ) N1
    ) N
    ...;
    

    Results:

    Id Name ValidFrom ValidTo Notes
    0 John 2025-01-01 2025-01-03 Order Created, User Created
    1 null 2025-01-01 2025-01-02 Order Created
    1 Mary 2025-01-02 9999-12-31 User Created
    0 Jon 2025-01-03 2025-01-04 User Renamed
    0 Mary 2025-01-04 9999-12-31 User Reassigned

    See this db<>fiddle.

    I'll leave the specifics of formatting the notes with additional details to the OP.

    For the above to work efficiently, I would recommend indexes on the following:

    Side note: I would question how a database with any kind of referential integrity could create an Orders row containing a UserId value, where the initial Orders.ValidFrom pre-dates the initial Users.ValidFrom. If this situation did not exists, the need for the LEFT JOIN and dummy since beginning-of-time rows would not be an issue.