I have a scenario where I am struggling to write the logic. I tried case and max functions, but it's not working.
I have a table as shown below, with key columns Ref1
and Ref2
. Now there is a scenario where I need to replace a value in one column with another column if the value in first column is null, but here the catch is this value is in a different row, and this need to be done only for the cases where TID in a group is 501 and 502 and TCD is 43. Within a group where we this TIDs 501 and 502 along with 43, there wont be any other id will be present
For example as shown the table for first two records, within a group (Ref1 and Ref2), I have TIDs with value 501 and 502 and TCD with values 43. Now in this case for both the rows TDATE have values. so in this case I can ignore the 2nd row and get only the first row as output.
Now if I take the ex. of row 6 and row 7, within a group (Ref1 and Ref2) I have TIDs with value 501 and 502 and TCD with values 43, but the TDATE for row 6 is NULL , so I need only row 7 as output and row 6 need to be ignored.
Now if I take the ex. of row 8 and row 9, within a group (Ref1 and Ref2) I have TIDs with value 501 and 502 and TCD with values 43, but the TDATE for row 9 is NULL , so anyway we need to take the row with TID 501 and other row can be ignored.
Rest all records should come as it is in the output.
Ref1 | Ref2 | TID | TCD | TDATE |
---|---|---|---|---|
800 | 900 | 501 | 43 | 2024-12-31 |
800 | 900 | 502 | 43 | 2024-12-22 |
345 | 934 | 544 | 35 | 2023-10-23 |
345 | 934 | 543 | 36 | 2023-11-21 |
700 | 876 | 501 | 43 | NULL |
700 | 876 | 502 | 43 | 2024-01-01 |
655 | 543 | 501 | 43 | 2024-04-25 |
655 | 543 | 502 | 43 | NULL |
434 | 465 | 411 | 37 | 2023-05-23 |
434 | 465 | 423 | 37 | 2024-06-12 |
Expected output is shown here:
Ref1 | Ref2 | TID | TCD | TDATE |
---|---|---|---|---|
800 | 900 | 501 | 43 | 2024-12-31 |
345 | 934 | 544 | 35 | 2023-10-23 |
345 | 934 | 543 | 36 | 2023-11-21 |
700 | 876 | 502 | 43 | 2024-01-01 |
655 | 543 | 501 | 43 | 2024-04-25 |
434 | 465 | 411 | 37 | 2023-05-23 |
434 | 465 | 423 | 37 | 2024-06-12 |
you can use CASE
and ROW_NUMBER
functions to achieve this. Here is the code:
WITH ranked_data AS (
SELECT
Ref1, Ref2, TID, TCD, TDATE,
ROW_NUMBER() OVER (PARTITION BY Ref1, Ref2 ORDER BY TDATE) AS rn,
COUNT(CASE WHEN TDATE IS NOT NULL THEN 1 END) OVER (PARTITION BY Ref1, Ref2) AS cnt
FROM your_table
)
SELECT
Ref1, Ref2, TID, TCD,
CASE
WHEN TDATE IS NULL AND cnt > 0 THEN (SELECT MAX(TDATE) FROM ranked_data WHERE Ref1 = rd.Ref1 AND Ref2 = rd.Ref2)
ELSE TDATE
END AS TDATE
FROM ranked_data rd
WHERE rn = 1 OR TDATE IS NOT NULL
ORDER BY Ref1, Ref2, TID;
ROW_NUMBER
function assigns a unique row number to each row
within each group of Ref1
and Ref2
.COUNT
window function counts the number of rows with non-NULL
TDATE
values within each group of Ref1
and Ref2
.rn = 1
or rows with non NULL
TDATE
values.CASE
statement on the outer query replaces the TDATE
value with the maximum TDATE
value from the same group.