sqlsql-serverdateclosest

Find the distinct closest date match for each record from 2 tables


Suppose I have Table1 and Table2 with the below data. I want to find the next closest Date match for Table 1(JoiningDt) that is available in Table2.ClosestDt for each ID. Note: Closest Match from Table 2 should be greater than the the date of Table1 and no 2 ID can have the same closest Match(Eg: ID 2 should take the value 08-Apr-2024 even though 07-Apr-2024 is the nearest one because it is already taken by ID 1.)

Table1:

ID JoiningDt DocNum
1 05-Apr-2024 A123
2 06-Apr-2024 A123
3 04-Apr-2024 B123

Table 2

DocNum ClosestDt
A123 03-Apr-2024
A123 04-Apr-2024
A123 07-Apr-2024
A123 08-Apr-2024
B123 02-Apr-2024
B123 05-Apr-2024

My Expected output is:

ID JoiningDt DocNum ClosestDt
1 05-Apr-2024 A123 07-Apr-2024
2 06-Apr-2024 A123 08-Apr-2024
3 04-Apr-2024 B123 05-Apr-2024

When I tried an left outer join , I get

ID JoiningDt DocNum ClosestDt
1 2024-04-05 A123 2024-04-07
1 2024-04-05 A123 2024-04-08
2 2024-04-06 A123 2024-04-07
2 2024-04-06 A123 2024-04-08
3 2024-04-04 B123 2024-04-05
select t1.ID ,t1.JoiningDt, t1.DocNum, (t2.ClosestDt)
from #Table1 t1
left join #Table2 t2 on
    t1.DocNum = t2.DocNum
    and t2.ClosestDt > t1.JoiningDt

I tried using rownumber also but the challenging part is getting the next match (8-apr and not 7-apr) for id 2 because 7-apr is already taken by Id 1.


Solution

  • You can use a correlated subquery, where you can pick the nearest date.

    For such dynamics, you need to handle the data at least twice. The basic idea is to use rownumber to determine where two dates a selected two or more tines and then simply choose the next in line, depending on the rownumber it has.

    This implies that there ia sufficient data in table 2 to fill the column, else you need some more programming, to determine which dates to take next, after the last.

    CREATE TABLE Table1 (
      ID INTEGER,
      JoiningDt DATETIME,
      DocNum VARCHAR(4)
    );
    
    INSERT INTO Table1
      (ID, JoiningDt, DocNum)
    VALUES
      ('1', '05-Apr-2024', 'A123'),
      ('2', '06-Apr-2024', 'A123'),
      ('3', '04-Apr-2024', 'B123');
    
    CREATE TABLE Table2 (
      DocNum VARCHAR(4),
      ClosestDt DATETIME
    );
    
    INSERT INTO Table2
      (DocNum, ClosestDt)
    VALUES
      ('A123', '03-Apr-2024'),
      ('A123', '04-Apr-2024'),
      ('A123', '07-Apr-2024'),
      ('A123', '08-Apr-2024'),
      ('B123', '02-Apr-2024'),
      ('B123', '05-Apr-2024');
    
    9 rows affected
    
    WITH CTE1 As (
    SELECT t1.ID, t1.JoiningDt, t1.DocNum,
      (SELECT TOP 1 ClosestDt FROM Table2 
      WHERE DocNum = t1.DocNum AND ClosestDt > t1.JoiningDt ORDER BY ClosestDt  ASC   ) ClosestDt
      , ROW_NUMBER() OVER(PARTITION BY DocNum,   (SELECT TOP 1 ClosestDt FROM Table2 
      WHERE DocNum = t1.DocNum AND ClosestDt > t1.JoiningDt ORDER BY ClosestDt  ASC   ) ORDER BY ID) rn
    FROM Table1 t1
     )
    SELECT ID, JoiningDt, DocNum, 
      CASE WHEN rn = 1 then ClosestDt ELSE
      (SELECT ClosestDt FROM Table2 
      WHERE DocNum = c1.DocNum AND ClosestDt > c1.JoiningDt ORDER BY ClosestDt  ASC   
      OFFSET c1.rn -1  ROWS FETCH NEXT 1 ROWS ONLY) END 
      FROM CTE1 c1
    
    ID JoiningDt DocNum (No column name)
    1 2024-04-05 00:00:00.000 A123 2024-04-07 00:00:00.000
    2 2024-04-06 00:00:00.000 A123 2024-04-08 00:00:00.000
    3 2024-04-04 00:00:00.000 B123 2024-04-05 00:00:00.000

    fiddle

    Another approach with two cte, so that the subselect doesn't run twice, but it is like the first only rewritten

    WITH CTE1 As (
    SELECT t1.ID, t1.JoiningDt, t1.DocNum,
      (SELECT TOP 1 ClosestDt FROM Table2 
      WHERE DocNum = t1.DocNum AND ClosestDt > t1.JoiningDt ORDER BY ClosestDt  ASC   ) ClosestDt
    FROM Table1 t1
     ), CTE2 AS (
      SELECT
           ID, JoiningDt, DocNum, ClosestDt
        , ROW_NUMBER() OVER(PARTITION BY DocNum,   ClosestDt ORDER BY ID) rn
      FROM CTE1
      )
    SELECT ID, JoiningDt, DocNum, 
      CASE WHEN rn = 1 then ClosestDt ELSE
      (SELECT ClosestDt FROM Table2 
      WHERE DocNum = c1.DocNum AND ClosestDt > c1.JoiningDt ORDER BY ClosestDt  ASC   
      OFFSET c1.rn -1  ROWS FETCH NEXT 1 ROWS ONLY) END 
      FROM CTE2 c1
    
    ID JoiningDt DocNum (No column name)
    1 2024-04-05 00:00:00.000 A123 2024-04-07 00:00:00.000
    2 2024-04-06 00:00:00.000 A123 2024-04-08 00:00:00.000
    3 2024-04-04 00:00:00.000 B123 2024-04-05 00:00:00.000

    fiddle

    A slow approach that should better be done in a programming language is to use a cursorm which is a slow approach and with to much rows, this would take a long time, so you should limit the number of rows for table 1

    DECLARE @ID int
    DECLARE @JoiningDt DATETIME
    DECLARE @DocNum VARCHAR(4)
    
    DECLARE @MyTableVar TABLE (
        ID INT NOT NULL,
        JoiningDt datetime,
        DocNum VARCHAR(4),
        NearestDate DATETIME);
    
    
    DECLARE db_cursor CURSOR FOR 
    SELECT ID, JoiningDt, DocNum FROM Table1
    
    OPEN db_cursor  
    FETCH NEXT FROM db_cursor INTO @ID, @JoiningDt, @DocNum
    
    WHILE @@FETCH_STATUS = 0  
    BEGIN  
    INSERT INTO @MyTableVar
    SELECT TOP 1 @ID, @JoiningDt, @DocNum,ClosestDt   FROM table2
    WHERE DocNum = @DocNum AND ClosestDt > @JoiningDt     
      AND NOT EXISTS (SELECT 1 FROM  @MyTableVar WHERE NearestDate  = ClosestDt)
      ORDER BY ClosestDt ASC; 
    
          FETCH NEXT FROM db_cursor INTO @ID, @JoiningDt, @DocNum
    END 
    
    CLOSE db_cursor  
    DEALLOCATE db_cursor
    SELECT * FROM @MyTableVar ORDER BY ID 
    
    
    ID JoiningDt DocNum NearestDate
    1 2024-04-05 00:00:00.000 A123 2024-04-07 00:00:00.000
    2 2024-04-06 00:00:00.000 A123 2024-04-08 00:00:00.000
    3 2024-04-04 00:00:00.000 B123 2024-04-05 00:00:00.000
    4 2024-04-07 00:00:00.000 A123 2024-04-09 00:00:00.000
    5 2024-04-08 00:00:00.000 A123 2024-04-11 00:00:00.000

    fiddle