mysqlperformanceinnodbmysql-slow-query-log

Are full count queries really so slow on a large MySQL InnoDB tables?


We have a large tables with millions of entrys. A full count is pretty slow, see code below. Is this quite common for a MySQL InnoDB table? Is there no way to accelerate this? Even with the query cache it's still "slow". I also wonder, why the count on "communication" table with 2.8 mio entries is slower than the count on "transaction" with 4.5 mio entries.

I'know that it's much faster with a where clause. I just want to know if the bad performance is normal.

We are using Amazon RDS MySQL 5.7 with an m4.xlarge (4 CPU, 16 GB RAM, 500 GB Storage). I've also already tried bigger instances with more CPU and RAM, but there is no big change on the query times.

mysql> SELECT COUNT(*) FROM transaction;
+----------+
| COUNT(*) |
+----------+
|  4569880 |
+----------+
1 row in set (1 min 37.88 sec)

mysql> SELECT COUNT(*) FROM transaction;
+----------+
| count(*) |
+----------+
|  4569880 |
+----------+
1 row in set (1.44 sec)

mysql> SELECT COUNT(*) FROM communication;
+----------+
| count(*) |
+----------+
|  2821486 |
+----------+
1 row in set (2 min 19.28 sec)

Solution

  • In addition to what Bill says...

    Smallest index

    InnoDB picks the 'smallest' index for doing COUNT(*). It could be that all of the indexes of communication are bigger than the smallest of transaction, hence the time difference. When judging the size of an index, include the PRIMARY KEY column(s) with any secondary index:

    PRIMARY KEY(id),   -- INT (4 bytes)
    INDEX(flag),       -- TINYINT (1 byte)
    INDEX(name),       -- VARCHAR(255) (? bytes)
    

    For measuring size, the PRIMARY KEY has big since it includes (due to clustering) all the columns of the table. INDEX(flag) is "5 bytes". INDEX(name) probably averages a few dozen bytes. SELECT COUNT(*) will clearly pick INDEX(flag).

    Apparently transaction has a 'small' index, but communication does not.

    TEXT/BLOG columns are sometimes stored "off-record". Hence, they do not count in the size of the PK index.

    Query Cache

    If the "Query cache" is turned on, the second running of a query may be immensely faster than the first. But that is only if there were no changes to the table in the mean time. Since any change to the table invalidates all QC entries for that table, the QC is rarely useful in production systems. By "faster" I mean on the order of 0.001 seconds; not 1.44 seconds.

    The difference between 1m38s and 1.44s is probably due to what was cached in the buffer_pool -- the general caching area for InnoDB. The first run probably found none of the 'smallest' index in RAM so it did a lot of I/O, taking 98 seconds to fetch all 4.5M rows of that index. The second run found all that data cached in the buffer_pool, so it ran at CPU speed (no I/O), hence much faster.

    Good Enough

    In situations like this, I question the necessity of doing the COUNT(*) at all. Notice how you said "2.8 mio entries", as if 2 significant digits was "good enough". If you are displaying the count to users on a UI, won't that be "good enough"? If so, one solution to the performance is to do the count once a day and store it some place. This would allow instantaneous access to a "good enough" value.

    There are other techniques. One is to keep the counter updated, either with active code, or with some form of Summary Table.

    Throwing hardware at it

    You already found that changing the hardware did not help.