I have a BQ table with duplicate (x2 times) rows of the same ad_id.
I want to delete old rows with ts > 120 min where there is a newer one with the same ad_id (Schema contains timestamp, ad_id, value. But there is not rowId
).
This is my try, is there a nicer way to do so?
DELETE FROM {table_full_name} o
WHERE timestamp < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 120 MINUTE) AND timestamp in (
SELECT MIN(timestamp)
FROM {table_full_name} i
WHERE i.ad_id=o.ad_id
GROUP BY ad_id)
Data example:
`ad-id` | `ts` | `value` |
`1` | Sep-1-2021 12:01 | `Scanned` |
`2` | Sep-1-2021 12:02 | `Error` |
`1` | Sep-1-2021 12:03 | `Removed` |
I want to clean it up to be:
`ad-id` | `ts` | `value` |
`2` | Sep-1-2021 12:02 | `Error` |
`1` | Sep-1-2021 12:03 | `Removed` |
I saw this post, but BQ doesn't support auto-increment for row-id
.
I saw this post. But how can I modify it without the ts interval (as it's unknown).
You can try this script. Used COUNT() with HAVING
to pull duplicate records with timestamp older than 120 minutes from current time using TIMESTAMP_DIFF
.
DELETE
FROM `table_full_name`
WHERE ad_id in (SELECT ad_id
FROM `table_full_name`
GROUP BY ad_id
HAVING COUNT(ad_id) > 1)
AND TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), timestamp, MINUTE) > 120
Before:
After: