sqlsql-serverjoinself-join

SQL JOIN to the same table explanation


I am refactoring some code that was written by another person that is no longer around. In this query a MAX(Date) is being selected, but then joined to they same table via MAX(Locator). The locator is an arbitrary sequential number that is assigned when a record is created, so essentially the MAX(Locator) would return the most recent record the same as MAX(Date). Is there any purpose to this seeming redundancy?

SELECT DISTINCT
    MAX(T.USERDATE2) AS BillPayDate
INTO
    #PersonBillPayAccounts
FROM 
    [TRACKING] T
JOIN
    (SELECT ACCOUNT_NUMBER, MAX(Locator) AS Locator
     FROM [TRACKING]
     WHERE Type = 32
       AND (userdate2 IS NOT NULL OR userdate2 != '') 
     GROUP BY ACCOUNT_NUMBER) L ON T.ACCOUNT_NUMBER = L.ACCOUNT_NUMBER 
                                AND T.LOCATOR = L.Locator 
                                AND T.Type = 32

Solution

  • Say you have data like this:

    Account_Number Locator UserDate2 Type
    1 A 20240725 32
    1 B 20240724 32
    2 A 20240721 32
    2 B 20240722 32

    The JOIN forces you to only use dates from the greater locator within each account. So you end up using July 24 instead of July 25 from account 1, and July 22 instead of July 21 from account 2 (but this is removed by the MAX() aggregation).

    But that's the old way. The newer way uses Window Functions, and should perform much better:

    WITH Dates As (
        SELECT
            FIRST_VALUE(UserDate2) OVER (PARTITION BY Account_Number ORDER BY Locator DESC) BillPayDate
        FROM Tracking
        WHERE UserDate2 IS NOT NULL AND Type=32
    )
    SELECT Max(BillPayDate) BillPayDate
    FROM Dates
    

    See it work here:

    https://dbfiddle.uk/XqR-FhH-


    By the way, the userdate2 != '' excerpt tells us this schema is broken. It is not okay to use varchar columns to store dates. Also, the DISTINCT was meaningless in the context of a MAX() aggregation. Finally, I strongly doubt a temp table was a good use of this result.