mysqlsqlnullsql-viewlocf

Removing NULLs in sequential data - MYSQL


I have a database for tracking claims payments. There's a table for claims claim, a table for monthly payments claim_month and a table defining each month month. month has each entry in order so that if month_id[1] > month_id[2] then the second figure is earlier than the first figure.

Using the query (the randomisation of paid_to_date is added for privacy purposes):

SELECT
claim.claim_id,
m.month_id,
claim_month_id,
IF (claim_month.paid_to_date IS NOT NULL, ROUND(RAND(1) * 100), NULL) AS paid_to_date
FROM
    claim
    INNER JOIN ( SELECT DISTINCT month_id FROM claim_month ) AS m
    LEFT JOIN claim_month ON claim.claim_id = claim_month.claim_id 
    AND m.month_id = claim_month.month_id

I get the following data.

INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (25, 1004, 8584, 41);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (25, 1005, NULL, NULL);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (25, 1006, NULL, NULL);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (25, 1007, NULL, NULL);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (21, 1004, 8580, 87);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (21, 1005, NULL, NULL);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (21, 1006, NULL, NULL);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (21, 1007, NULL, NULL);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (5, 1004, 8564, 14);
INSERT INTO ``(`claim_id`, `month_id`, `claim_month_id`, `paid_to_date`) VALUES (5, 1005, 8627, 9);

Data visualised

From here, I need to replace NULLs with the latest non-null observation for each claim_id.

I've spent the whole morning looking through previous questions however most of them are for PostgresSQL which isn't particularly helpful in this context. What am I missing?


Solution

  • I've worked out a solution, but I'm not convinced it's the best. I suspect that for larger databases, this would be quite demanding. It works in the meantime however.

    I've essentially joined the table onto itself repeatedly wherever a record is earlier and on the same claim using something similar to the following:

    SELECT 
        b.claim_id,
        b.month_id,
        b.claim_month_id,
        claim_month.claim_month_id AS claim_month_id_latest
    
    FROM
    
    (SELECT
        a.claim_id,
        a.month_id,
        a.claim_month_id,
        MAX(claim_month.month_id) AS source_month_id
    
    FROM
        (
        SELECT
            claim.claim_id,
            m.month_id,
            claim_month_id
        FROM
            claim
            INNER JOIN ( SELECT DISTINCT month_id FROM claim_month ) AS m
            LEFT JOIN claim_month ON claim.claim_id = claim_month.claim_id 
            AND m.month_id = claim_month.month_id 
    
        ) AS a
        LEFT JOIN claim_month ON a.claim_id = claim_month.claim_id 
                                                    AND a.month_id >= claim_month.month_id
    
    GROUP BY
        a.claim_id, a.month_id) AS b
        LEFT JOIN claim_month ON b.claim_id = claim_month.claim_id AND b.source_month_id = claim_month.month_id
    
    
    ORDER BY b.claim_id, b.month_id