For some reason, I miss the ingestion of three days worth of data in a bigquery table. Now, I know that simply copying data from the last non-missing day is not the best way to impute missing data, but for my purposes, this is good enough.
I know that I could copy the last missing day, transform the date in pandas to DATE + 1, DATE +2 and so on and then append that data to the original table in bigquery. But, I would rather avoid having to do this. Is there a good and easy way to do this directly in bigquery or with dataform? I am not very comfortable with SQL.
Thanks for any given advice.
You can do the following. The query is self explanatory, but here is some details:
use the DATE_ADD()
and DATE_SUB()
to modify the data returned and to filter the day you want to copy from.
Use the union to return a single table many times with different modification and filters
Use the insert as described following to insert the retrieved data in the table.
Before run the insert, run only the selects and unions to check if that is the data you want
I've returned data from 1, 2 and 3 days ago (date_col = DATE_SUB(CURRENT_DATE(), interval 2 DAY))
and added 1 day on if date field.
INSERT INTO `<p>.<ds>.<t>` (date_col, data) (
SELECT DATE_ADD(date_col, INTERVAL 1 DAY) as date, data FROM `<p>.<ds>.<t>` where date_col = DATE_SUB(CURRENT_DATE(), interval 1 DAY)
UNION ALL
SELECT DATE_ADD(date_col, INTERVAL 1 DAY) as date, data FROM `<p>.<ds>.<t>` where date_col = DATE_SUB(CURRENT_DATE(), interval 2 DAY)
UNION ALL
SELECT DATE_ADD(date_col, INTERVAL 1 DAY) as date, data FROM `<p>.<ds>.<t>` where date_col = DATE_SUB(CURRENT_DATE(), interval 3 DAY)
)