mysqldatabasemariadbinnodbbufferpool

Why is InnoDB so painfully slow on full table scans even though entire data is in buffer pool?


Environment

my.ini

[mysqld]
default-storage-engine=INNODB
log-output=NONE
general-log=0
general_log_file="mariadb.log"
slow-query-log=0
query_cache_type=OFF
query_cache_size=0
innodb_buffer_pool_size=64G

DDL

CREATE TABLE testinnodb
(
    a INTEGER NOT NULL, b INTEGER NOT NULL, c INTEGER NOT NULL,
    i FLOAT NOT NULL, j FLOAT NOT NULL, k FLOAT NOT NULL,
    x CHAR(20) NOT NULL, y CHAR(20) NOT NULL, z CHAR(20) NOT NULL
) ENGINE=InnoDB;

Same schema for MyISAM and Memory.

Tables are filled with 10M rows of random data, resulting data sizes:
InnoDB: 1.0 GB
MyISAM: 810 MB
Memory: 867 MB

SQL

SELECT * FROM testinnodb WHERE c=1;
SELECT * FROM testmyisam WHERE c=1;
SELECT * FROM testmemory WHERE c=1;

InnoDB: 2.4s !!!
MyISAM: 0.3s
Memory: 0.2s

The queries are run multiple times but performance stays the same. EXPLAIN gives the same output for all three queries (SIMPLE, USING WHERE).

This is clearly not an I/O issue, given the hardware and the performance of MyISAM and Memory in comparison.

64GB for the buffer pool is also more than enough to hold all of that table in memory.
The data must be in the buffer pool because disabling innodb_buffer_pool_load_at_startup, the query will take 4.2s on first run, and then 2.4s in subsequent runs.
innodb_buffer_pool_bytes_data will have grown by over 1GB after the first run, so it looks like the entire data is in fact in the buffer pool.
innodb_buffer_pool_read_requests does increase by about 10M on each execution.

Why the hell is reading the data using InnoDB from the buffer pool (i.e. RAM) 10 times slower than reading the same data using MyISAM (i.e. from SSD)?

I need help understanding what is going on. Surely this can't be right? I've tried playing around with the DB config (e.g. innodb_old_blocks_time=0, innodb_read_io_threads=32 and innodb_write_io_threads=32) but this literally changed nothing.

I know using an INDEX will improve things, but this is not the point.

Let me know if you need some status variables for debugging, I'm new to InnoDB so I'm not sure what was relevant to post here.

Output of SHOW ENGINE InnoDB STATUS; after startup and querying the InnoDB table twice

=====================================
Per second averages calculated from the last 50 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 50 srv_idle
srv_master_thread log flush and writes: 50
----------
SEMAPHORES
----------
------------
TRANSACTIONS
------------
Trx id counter 20001586
Purge done for trx's n:o < 20001583 undo n:o < 0 state: running
History list length 14
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION (000002727BD64108), not started 
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync) log: 0; buffer pool: 0
71726 OS file reads, 2 OS file writes, 2 OS fsyncs
0.00 reads/s, 16413 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 8881303100
Log flushed up to   8881303100
Pages flushed up to 8881303100
Last checkpoint at  8881303088
0 pending log flushes, 0 pending chkp writes
4 log i/o's done, 0.08 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 68753031168
Dictionary memory allocated 424846000
Buffer pool size   4147712
Free buffers       4075870
Database pages     71842
Old database pages 26539
Modified db pages  0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 71711, created 131, written 0
1434.19 reads/s, 2.62 creates/s, 0.00 writes/s
Buffer pool hit rate 996 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 71842, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 0, updated 0, deleted 0, read 20000000
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 399992.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

Solution

  • There is quite a bit of overhead in the interface between InnoDB and the SQL interpreter of the MySQL or MariaDB server.

    In InnoDB, each access must be protected by a buffer pool page latch. A mini-transaction object will keep track of the acquired latches. Basically, for every fetched row, InnoDB will start a mini-transaction, look up the B-tree leaf page in the buffer pool, acquire the page latch, copy the data, and finally commit the mini-transaction and release the page latch.

    There are a couple of optimizations on top of this, but this is insufficient, and it would be better to implement MDEV-16232 to allow a mini-transaction to persist across the entire range scan. In that way, we would only acquire and release page latches when advancing to the next page.

    In range scans, a persistent cursor (btr_pcur_t) will store the current position. When the cursor position is restored at the start of the next mini-transaction (to fetch the next record), an optimistic restore will be attempted, with the assumption that the old pointer to the buffer pool page is still valid.

    InnoDB also implements a prefetch buffer. After 4 next-record read operations, InnoDB will copy 8 records at a time to the buffer, within a single mini-transaction. Subsequent requests will then be satisfied from this buffer. This mechanism would be made redundant by MDEV-16232 and should be removed as part of implementing it.

    Implementing MDEV-16232 would also speed up UPDATE and DELETE operations, by removing the need to acquire explicit record locks. If we continuously hold the page latch for the whole duration of deleting or updating a row, we can rely on implicit locking whenever no conflicts exist, just like we do in the INSERT case.