I have data set that looks like this:
internal_id | match_id | company_id | market_id | selection_id | odd_value | update_date |
---|---|---|---|---|---|---|
1 | 8576748 | 66 | 1 | 1 | 1.31 | 2021-01-05 17:59:41 |
36730 | 8576748 | 66 | 1 | 1 | 1.30 | 2021-01-05 18:58:41 |
47053 | 8576748 | 66 | 1 | 1 | 1.31 | 2021-01-05 19:59:41 |
2 | 8576748 | 66 | 1 | 2 | 14.00 | 2021-01-05 17:59:41 |
36731 | 8576748 | 66 | 1 | 2 | 14.00 | 2021-01-05 18:58:41 |
47054 | 8576748 | 66 | 1 | 2 | 14.00 | 2021-01-05 19:59:41 |
3 | 8576748 | 66 | 1 | 3 | 3.75 | 2021-01-05 17:59:41 |
36732 | 8576748 | 66 | 1 | 3 | 3.75 | 2021-01-05 18:58:41 |
47055 | 8576748 | 66 | 1 | 3 | 3.75 | 2021-01-05 19:59:41 |
I need to delete rows in which previous odd_value for the same match_id+company_id+market_id+selection_id is the same. In this example rows with internal_id: 36731, 47054, 36732, 47055 should be deleted.
As you can see it's not a duplication deletion, rows 1 and 47053 are the same (for match_id+company_id+market_id+selection_id), but they should remain as they are because odd_value 1,30 (for internal_id 36730) is different than previous odd_value and odd_value 1,31 (for internal_id 47053) is different than previous odd_value.
To list this data I used this query:
SELECT
`internal_id`,
`match_id`,
`company_id`,
`market_id`,
`selection_id`,
`odd_value`,
`update_date`
FROM
`odds`
WHERE
`match_id` = 8576748
AND `company_id` = 66
ORDER BY `match_id`, `company_id`, `market_id`, `selection_id`, `update_date`
here is how you can distinguish rows you want to delete and write the delete statement the same way :
SELECT *
,CASE WHEN LAG(odd_value) OVER (PARTITION BY match_id, company_id , market_id,selection_id ORDER BY update_date) = odd_value THEN 1 ELSE 0 END AS tobeDeleted
FROM `odds`