MySQL page size documentation says:
For releases up to and including MySQL 5.5, the size of each InnoDB page is fixed at 16 kilobytes. This value represents a balance: large enough to hold the data for most rows, yet small enough to minimize the performance overhead of transferring unneeded data to memory. Other values are not tested or supported.
Starting in MySQL 5.6, the page size for an InnoDB instance can be either 4KB, 8KB, or 16KB, controlled by the innodb_page_size configuration option. As of MySQL 5.7.6, InnoDB also supports 32KB and 64KB page sizes. For 32KB and 64KB page sizes, ROW_FORMAT=COMPRESSED is not supported and the maximum record size is 16KB.
and then
Smaller page sizes can help performance with storage devices that use small block sizes, particularly for SSD devices in disk-bound workloads, such as for OLTP applications. As individual rows are updated, less data is copied into memory, written to disk, reorganized, locked, and so on.
How about using higher (32KB or 64KB) page size than the default 16KB? In what case should you do that and what you get as a benefit?
I will be setting up a new MySQL instance with traditional HDD and I was wondering if changing default 16KB could have impact on performance, efficiency of storage utilization.
So far I only found a drawback of using 32KB and 64KB page size:
For 32KB and 64KB page sizes, ROW_FORMAT=COMPRESSED is not supported and the maximum record size is 16KB.
Some people hit the max row size of about 8000 bytes. Switching to 32KB pages doubles that limit. However, switching to 64KB does not go past 16KB limit.
Since InnoDB blocks tend to be scattered around the disk, having bigger blocks will save slightly in arm motion on a HDD. I would expect this to be only single-digit percentage improvement. And it would vary depending on the type of activity. A freshly loaded table may show no improvement; a table with a lot of churn may show some.
If your dataset fits in the buffer_pool, then there is not much I/O to be done, so the block size does not come into play much.
The cost of arm motion is somewhat minimized by drivers or controllers that optimizer the order of disk operations. RAID with a cache does an especially good job, and it can make writes virtually instant. "Hole punching" probably adds to the frequency of arm motion. Classic tradeoff: "speed versus space".
If your dataset is too large to fit in RAM, and if you do a lot of "point queries", then a smaller block size would be better. But if you do a lot of table (or index) scans, then a larger block size has a slight benefit.
Keep in mind that all the data must use the same blocksize.
Of the thousands of forum Question I have seen, I don't think any changed the blocksize. You will be in uncharted waters.
Also note that, while ROW_FORMAT=COMPRESSED
saves some disk space, it chews up some RAM -- this is because some (all?) blocks are kept in RAM both compressed and uncompressed.
The numbers I have seen for that row_format is only 50%. Any decent compression algorithm compresses virtually any sort of text by about 2/3. So, if I am tempted to use compression, I compress the compressible columns (eg TEXT
, but not jpgs), and do it in the client, thereby offloading the CPU effort from the server. I believe (without any hard evidence) that this is a better way to compress your data. Also, I almost never use BIGINT
.
(Everything I say here is theoretical, based either in very limited MySQL documentation or the principles of HDDs.)
I have a Rule of Thumb for optimizations. "If the back-of-envelope calculations estimate less than 10% improvement, then move on -- look for something else to optimize."
So, I say "move on".