mysqlperformanceinnodb

How can I improve DELETE FROM performance on large InnoDB tables?


I have a fairly large InnoDB table which contains about 10 million rows (and counting, it is expected to become 20 times that size). Each row is not that large (131 B on average), but from time to time I have to delete a chunk of them, and that is taking ages. This is the table structure:

 CREATE TABLE `problematic_table` (
    `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
    `taxid` int(10) unsigned NOT NULL,
    `blastdb_path` varchar(255) NOT NULL,
    `query` char(32) NOT NULL,
    `target` int(10) unsigned NOT NULL,
    `score` double NOT NULL,
    `evalue` varchar(100) NOT NULL,
    `log_evalue` double NOT NULL DEFAULT '-999',
    `start` int(10) unsigned DEFAULT NULL,
    `end` int(10) unsigned DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `taxid` (`taxid`),
    KEY `query` (`query`),
    KEY `target` (`target`),
    KEY `log_evalue` (`log_evalue`)
) ENGINE=InnoDB AUTO_INCREMENT=7888676 DEFAULT CHARSET=latin1;

Queries that delete large chunks from the table are simply like this:

DELETE FROM problematic_table WHERE problematic_table.taxid = '57';

A query like this just took almost an hour to finish. I can imagine that the index rewriting overhead makes these queries very slow.

I am developing an application that will run on pre-existing databases. I most likely have no control over server variables unless I make changes to them mandatory (which I would prefer not to), so I'm afraid suggestions that change those are of little value.

I have tried to INSERT ... SELECT those rows that I don't want to delete into a temporary table and just dropping the rest, but as the ratio of to-delete vs. to-keep shifts towards to-keep, this is no longer a useful solution.

This is a table that may see frequent INSERTs and SELECTs in the future, but no UPDATEs. Basically, it's a logging and reference table that needs to drop parts of its content from time to time.

Could I improve my indexes on this table by limiting their length? Would switching to MyISAM help, which supports DISABLE KEYS during transactions? What else could I try to improve DELETE performance?

Edit: One such deletion would be in the order of about one million of rows.


Solution

  • This solution can provide better performance once completed, but the process may take some time to implement.

    A new BIT column can be added and defaulted to TRUE for "active" and FALSE for "inactive". If that's not enough states, you could use TINYINT with 256 possible values.

    Adding this new column will probably take a long time, but once it's over, your updates should be much faster as long as you do it off the PRIMARY as you do with your deletes and don't index this new column.

    The reason why InnoDB takes so long to DELETE on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY, first UNIQUE it finds, or whatever it can determine as an adequate substitute if it can't find PRIMARY or UNIQUE, so when one row is deleted, it now reorders your entire table physically on the disk for speed and defragmentation. So it's not the DELETE that's taking so long; it's the physical reordering after that row is removed.

    When you create a fixed width column and update that instead of deleting, there's no need for physical reordering across your huge table because the space consumed by a row and table itself is constant.

    During off hours, a single DELETE can be used to remove the unnecessary rows. This operation will still be slow but collectively much faster than deleting individual rows.