google-bigquerydataform

Impute missing days with copy of last non-missing day in BigQuery


For some reason, I miss the ingestion of three days worth of data in a bigquery table. Now, I know that simply copying data from the last non-missing day is not the best way to impute missing data, but for my purposes, this is good enough.

I know that I could copy the last missing day, transform the date in pandas to DATE + 1, DATE +2 and so on and then append that data to the original table in bigquery. But, I would rather avoid having to do this. Is there a good and easy way to do this directly in bigquery or with dataform? I am not very comfortable with SQL.

Thanks for any given advice.


Solution

  • You can do the following. The query is self explanatory, but here is some details:

    INSERT INTO `<p>.<ds>.<t>` (date_col, data)  (
        SELECT DATE_ADD(date_col, INTERVAL 1 DAY) as date, data FROM `<p>.<ds>.<t>` where date_col =  DATE_SUB(CURRENT_DATE(), interval 1 DAY)
        UNION ALL
        SELECT DATE_ADD(date_col, INTERVAL 1 DAY) as date, data FROM `<p>.<ds>.<t>` where date_col =  DATE_SUB(CURRENT_DATE(), interval 2 DAY)
        UNION ALL
        SELECT DATE_ADD(date_col, INTERVAL 1 DAY) as date, data FROM `<p>.<ds>.<t>` where date_col =  DATE_SUB(CURRENT_DATE(), interval 3 DAY)
    
    )