mysqlloopsupdates

Performance problems with updating/deleting >100K rows with MySQL


  1. Imagine there is a table with 1M rows with categories 1-100.
  2. I need to update rows where f.e. category=10 (there are let's say 150k rows).
  3. I will update 120k rows and need to delete 30k rows.

First idea: Currently I am using on beginning update of all rows to 0 and on update change this value to 1. Then delete all rows where category=10 and update=0.

There is problem with performance to update 150k rows to 0 where category=10. Sometimes it takes 30s because there could be 200k rows not only 30k.

Second idea On the beginning, loop all 150k rows to keep id's in array, then fill a new array with updated ids and at the end use array_diff to get the remaining ids to delete.

There is also problem with performance to make sql like "... where id in (...30k ids...)".

Do you guys using something better to solve this work? Thanks.


Solution

  • Variation on your first idea: Define your flag column as a timestamp instead of a boolean, then you don't have to take 30 seconds to initialize it to 0. Just update that timestamp to NOW() as you update rows. Once you are done, any rows where the flag column is older than your first updated row should be deleted. I'm assuming this update/delete task will be done again periodically, but the timestamp should still work as long as the tasks do not overlap.

    Variation on your second idea: Don't run a query DELETE FROM imagine WHERE id IN(...30k ids...) predicate. Instead, you may run a series of DELETE FROM imagine WHERE id IN (...100 ids...). Loop over your list of id's and delete in batches of 100 at a time. You'll need to run 300 DELETE statements this way, but it's easy to write the loop.