sqlteradata

Replace null from another row


I have a scenario where I am struggling to write the logic. I tried case and max functions, but it's not working.

I have a table as shown below, with key columns Ref1 and Ref2. Now there is a scenario where I need to replace a value in one column with another column if the value in first column is null, but here the catch is this value is in a different row, and this need to be done only for the cases where TID in a group is 501 and 502 and TCD is 43. Within a group where we this TIDs 501 and 502 along with 43, there wont be any other id will be present

For example as shown the table for first two records, within a group (Ref1 and Ref2), I have TIDs with value 501 and 502 and TCD with values 43. Now in this case for both the rows TDATE have values. so in this case I can ignore the 2nd row and get only the first row as output.

Now if I take the ex. of row 6 and row 7, within a group (Ref1 and Ref2) I have TIDs with value 501 and 502 and TCD with values 43, but the TDATE for row 6 is NULL , so I need only row 7 as output and row 6 need to be ignored.

Now if I take the ex. of row 8 and row 9, within a group (Ref1 and Ref2) I have TIDs with value 501 and 502 and TCD with values 43, but the TDATE for row 9 is NULL , so anyway we need to take the row with TID 501 and other row can be ignored.

Rest all records should come as it is in the output.

Ref1 Ref2 TID TCD TDATE
800 900 501 43 2024-12-31
800 900 502 43 2024-12-22
345 934 544 35 2023-10-23
345 934 543 36 2023-11-21
700 876 501 43 NULL
700 876 502 43 2024-01-01
655 543 501 43 2024-04-25
655 543 502 43 NULL
434 465 411 37 2023-05-23
434 465 423 37 2024-06-12

Expected output is shown here:

Ref1 Ref2 TID TCD TDATE
800 900 501 43 2024-12-31
345 934 544 35 2023-10-23
345 934 543 36 2023-11-21
700 876 502 43 2024-01-01
655 543 501 43 2024-04-25
434 465 411 37 2023-05-23
434 465 423 37 2024-06-12

Solution

  • you can use CASE and ROW_NUMBER functions to achieve this. Here is the code:

    WITH ranked_data AS (
      SELECT 
        Ref1, Ref2, TID, TCD, TDATE,
        ROW_NUMBER() OVER (PARTITION BY Ref1, Ref2 ORDER BY TDATE) AS rn,
        COUNT(CASE WHEN TDATE IS NOT NULL THEN 1 END) OVER (PARTITION BY Ref1, Ref2) AS cnt
      FROM your_table
    )
    SELECT 
      Ref1, Ref2, TID, TCD, 
      CASE 
        WHEN TDATE IS NULL AND cnt > 0 THEN (SELECT MAX(TDATE) FROM ranked_data WHERE Ref1 = rd.Ref1 AND Ref2 = rd.Ref2)
        ELSE TDATE
      END AS TDATE
    FROM ranked_data rd
    WHERE rn = 1 OR TDATE IS NOT NULL
    ORDER BY Ref1, Ref2, TID;
    
    1. The ROW_NUMBER function assigns a unique row number to each row within each group of Ref1 and Ref2.
    2. The COUNT window function counts the number of rows with non-NULL TDATE values within each group of Ref1 and Ref2.
    3. The outer query selects only the rows with rn = 1 or rows with non NULL TDATE values.
    4. The CASE statement on the outer query replaces the TDATE value with the maximum TDATE value from the same group.