I have a fairly large InnoDB table which contains about 10 million rows (and counting, it is expected to become 20 times that size). Each row is not that large (131 B on average), but from time to time I have to delete a chunk of them, and that is taking ages. This is the table structure:
CREATE TABLE `problematic_table` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`taxid` int(10) unsigned NOT NULL,
`blastdb_path` varchar(255) NOT NULL,
`query` char(32) NOT NULL,
`target` int(10) unsigned NOT NULL,
`score` double NOT NULL,
`evalue` varchar(100) NOT NULL,
`log_evalue` double NOT NULL DEFAULT '-999',
`start` int(10) unsigned DEFAULT NULL,
`end` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxid` (`taxid`),
KEY `query` (`query`),
KEY `target` (`target`),
KEY `log_evalue` (`log_evalue`)
) ENGINE=InnoDB AUTO_INCREMENT=7888676 DEFAULT CHARSET=latin1;
Queries that delete large chunks from the table are simply like this:
DELETE FROM problematic_table WHERE problematic_table.taxid = '57';
A query like this just took almost an hour to finish. I can imagine that the index rewriting overhead makes these queries very slow.
I am developing an application that will run on pre-existing databases. I most likely have no control over server variables unless I make changes to them mandatory (which I would prefer not to), so I'm afraid suggestions that change those are of little value.
I have tried to INSERT ... SELECT
those rows that I don't want to delete into a temporary table and just dropping the rest, but as the ratio of to-delete vs. to-keep shifts towards to-keep, this is no longer a useful solution.
This is a table that may see frequent INSERT
s and SELECT
s in the future, but no UPDATE
s. Basically, it's a logging and reference table that needs to drop parts of its content from time to time.
Could I improve my indexes on this table by limiting their length? Would switching to MyISAM help, which supports DISABLE KEYS
during transactions? What else could I try to improve DELETE
performance?
Edit: One such deletion would be in the order of about one million of rows.
This solution can provide better performance once completed, but the process may take some time to implement.
A new BIT
column can be added and defaulted to TRUE
for "active" and FALSE
for "inactive". If that's not enough states, you could use TINYINT
with 256 possible values.
Adding this new column will probably take a long time, but once it's over, your updates should be much faster as long as you do it off the PRIMARY
as you do with your deletes and don't index this new column.
The reason why InnoDB takes so long to DELETE
on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY
, first UNIQUE
it finds, or whatever it can determine as an adequate substitute if it can't find PRIMARY
or UNIQUE
, so when one row is deleted, it now reorders your entire table physically on the disk for speed and defragmentation. So it's not the DELETE
that's taking so long; it's the physical reordering after that row is removed.
When you create a fixed width column and update that instead of deleting, there's no need for physical reordering across your huge table because the space consumed by a row and table itself is constant.
During off hours, a single DELETE
can be used to remove the unnecessary rows. This operation will still be slow but collectively much faster than deleting individual rows.