Suppose I have Table1 and Table2 with the below data. I want to find the next closest Date match for Table 1(JoiningDt) that is available in Table2.ClosestDt for each ID. Note: Closest Match from Table 2 should be greater than the the date of Table1 and no 2 ID can have the same closest Match(Eg: ID 2 should take the value 08-Apr-2024 even though 07-Apr-2024 is the nearest one because it is already taken by ID 1.)
Table1:
ID | JoiningDt | DocNum |
---|---|---|
1 | 05-Apr-2024 | A123 |
2 | 06-Apr-2024 | A123 |
3 | 04-Apr-2024 | B123 |
Table 2
DocNum | ClosestDt |
---|---|
A123 | 03-Apr-2024 |
A123 | 04-Apr-2024 |
A123 | 07-Apr-2024 |
A123 | 08-Apr-2024 |
B123 | 02-Apr-2024 |
B123 | 05-Apr-2024 |
My Expected output is:
ID | JoiningDt | DocNum | ClosestDt |
---|---|---|---|
1 | 05-Apr-2024 | A123 | 07-Apr-2024 |
2 | 06-Apr-2024 | A123 | 08-Apr-2024 |
3 | 04-Apr-2024 | B123 | 05-Apr-2024 |
When I tried an left outer join , I get
ID | JoiningDt | DocNum | ClosestDt |
---|---|---|---|
1 | 2024-04-05 | A123 | 2024-04-07 |
1 | 2024-04-05 | A123 | 2024-04-08 |
2 | 2024-04-06 | A123 | 2024-04-07 |
2 | 2024-04-06 | A123 | 2024-04-08 |
3 | 2024-04-04 | B123 | 2024-04-05 |
select t1.ID ,t1.JoiningDt, t1.DocNum, (t2.ClosestDt)
from #Table1 t1
left join #Table2 t2 on
t1.DocNum = t2.DocNum
and t2.ClosestDt > t1.JoiningDt
I tried using rownumber also but the challenging part is getting the next match (8-apr and not 7-apr) for id 2 because 7-apr is already taken by Id 1.
You can use a correlated subquery, where you can pick the nearest date.
For such dynamics, you need to handle the data at least twice. The basic idea is to use rownumber to determine where two dates a selected two or more tines and then simply choose the next in line, depending on the rownumber it has.
This implies that there ia sufficient data in table 2 to fill the column, else you need some more programming, to determine which dates to take next, after the last.
CREATE TABLE Table1 (
ID INTEGER,
JoiningDt DATETIME,
DocNum VARCHAR(4)
);
INSERT INTO Table1
(ID, JoiningDt, DocNum)
VALUES
('1', '05-Apr-2024', 'A123'),
('2', '06-Apr-2024', 'A123'),
('3', '04-Apr-2024', 'B123');
CREATE TABLE Table2 (
DocNum VARCHAR(4),
ClosestDt DATETIME
);
INSERT INTO Table2
(DocNum, ClosestDt)
VALUES
('A123', '03-Apr-2024'),
('A123', '04-Apr-2024'),
('A123', '07-Apr-2024'),
('A123', '08-Apr-2024'),
('B123', '02-Apr-2024'),
('B123', '05-Apr-2024');
9 rows affected
WITH CTE1 As (
SELECT t1.ID, t1.JoiningDt, t1.DocNum,
(SELECT TOP 1 ClosestDt FROM Table2
WHERE DocNum = t1.DocNum AND ClosestDt > t1.JoiningDt ORDER BY ClosestDt ASC ) ClosestDt
, ROW_NUMBER() OVER(PARTITION BY DocNum, (SELECT TOP 1 ClosestDt FROM Table2
WHERE DocNum = t1.DocNum AND ClosestDt > t1.JoiningDt ORDER BY ClosestDt ASC ) ORDER BY ID) rn
FROM Table1 t1
)
SELECT ID, JoiningDt, DocNum,
CASE WHEN rn = 1 then ClosestDt ELSE
(SELECT ClosestDt FROM Table2
WHERE DocNum = c1.DocNum AND ClosestDt > c1.JoiningDt ORDER BY ClosestDt ASC
OFFSET c1.rn -1 ROWS FETCH NEXT 1 ROWS ONLY) END
FROM CTE1 c1
ID | JoiningDt | DocNum | (No column name) |
---|---|---|---|
1 | 2024-04-05 00:00:00.000 | A123 | 2024-04-07 00:00:00.000 |
2 | 2024-04-06 00:00:00.000 | A123 | 2024-04-08 00:00:00.000 |
3 | 2024-04-04 00:00:00.000 | B123 | 2024-04-05 00:00:00.000 |
Another approach with two cte, so that the subselect doesn't run twice, but it is like the first only rewritten
WITH CTE1 As (
SELECT t1.ID, t1.JoiningDt, t1.DocNum,
(SELECT TOP 1 ClosestDt FROM Table2
WHERE DocNum = t1.DocNum AND ClosestDt > t1.JoiningDt ORDER BY ClosestDt ASC ) ClosestDt
FROM Table1 t1
), CTE2 AS (
SELECT
ID, JoiningDt, DocNum, ClosestDt
, ROW_NUMBER() OVER(PARTITION BY DocNum, ClosestDt ORDER BY ID) rn
FROM CTE1
)
SELECT ID, JoiningDt, DocNum,
CASE WHEN rn = 1 then ClosestDt ELSE
(SELECT ClosestDt FROM Table2
WHERE DocNum = c1.DocNum AND ClosestDt > c1.JoiningDt ORDER BY ClosestDt ASC
OFFSET c1.rn -1 ROWS FETCH NEXT 1 ROWS ONLY) END
FROM CTE2 c1
ID | JoiningDt | DocNum | (No column name) |
---|---|---|---|
1 | 2024-04-05 00:00:00.000 | A123 | 2024-04-07 00:00:00.000 |
2 | 2024-04-06 00:00:00.000 | A123 | 2024-04-08 00:00:00.000 |
3 | 2024-04-04 00:00:00.000 | B123 | 2024-04-05 00:00:00.000 |
A slow approach that should better be done in a programming language is to use a cursorm which is a slow approach and with to much rows, this would take a long time, so you should limit the number of rows for table 1
DECLARE @ID int
DECLARE @JoiningDt DATETIME
DECLARE @DocNum VARCHAR(4)
DECLARE @MyTableVar TABLE (
ID INT NOT NULL,
JoiningDt datetime,
DocNum VARCHAR(4),
NearestDate DATETIME);
DECLARE db_cursor CURSOR FOR
SELECT ID, JoiningDt, DocNum FROM Table1
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO @ID, @JoiningDt, @DocNum
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO @MyTableVar
SELECT TOP 1 @ID, @JoiningDt, @DocNum,ClosestDt FROM table2
WHERE DocNum = @DocNum AND ClosestDt > @JoiningDt
AND NOT EXISTS (SELECT 1 FROM @MyTableVar WHERE NearestDate = ClosestDt)
ORDER BY ClosestDt ASC;
FETCH NEXT FROM db_cursor INTO @ID, @JoiningDt, @DocNum
END
CLOSE db_cursor
DEALLOCATE db_cursor
SELECT * FROM @MyTableVar ORDER BY ID
ID | JoiningDt | DocNum | NearestDate |
---|---|---|---|
1 | 2024-04-05 00:00:00.000 | A123 | 2024-04-07 00:00:00.000 |
2 | 2024-04-06 00:00:00.000 | A123 | 2024-04-08 00:00:00.000 |
3 | 2024-04-04 00:00:00.000 | B123 | 2024-04-05 00:00:00.000 |
4 | 2024-04-07 00:00:00.000 | A123 | 2024-04-09 00:00:00.000 |
5 | 2024-04-08 00:00:00.000 | A123 | 2024-04-11 00:00:00.000 |