mysqlindexingmariadbclustered-index

Character and integer performance differences in MySQL clustered index


I would like to know the difference between character type and integer type when configuring MySQL clustered index.

If you look at the principles of clustered index, it's a sequential configuration. If I configure PK with strings and add data, is it correct that insertion occurs in the middle of the index and this causes index page segmentation, resulting in poor performance?

If you're familiar with clustered indexes, please explain.


Solution

  • It's true that choosing a sequential primary key has an advantage. Appending to the end of a B-tree (which the clustered index is) results in a more efficiently stored index, and therefore you get more records in fewer pages. This allows the buffer pool to store more of your table in less RAM, and so on.

    There's a good Percona blog that illustrates the difference with some graphical maps of the clustered index: Illustrating Primary Key models in InnoDB and their impact on disk usage.

    I believe the same benefit would apply for a character-based primary key, if you were inserting rows in sequential order.

    But the less efficient method, whether it's for a character or integer key, would be to insert in random order.

    In my experience, this difference (sequential insertion order vs. random insertion order) is a greater factor for performance than the difference between character and integer alone.

    That's not to say there is no measurable difference at all, but I think the difference is small enough that it's in the realm of micro-optimization, and thus won't give enough benefit to justify the choice. You'd be better off choosing character or integer as your primary key for logical reasons, not for performance reasons.