MySQL's documentation says:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
However, I want to know if there's any performance penalty for using this feature. Suppose I have a large table that stores when a user sees a comment:
CREATE TABLE `comment_views` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) NOT NULL,
`comment_id` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `comment_views_comment_id_user_id_unique` (`comment_id`,`user_id`),
KEY `comment_views_user_id_foreign` (`user_id`),
CONSTRAINT `comment_views_comment_id_foreign` FOREIGN KEY (`comment_id`) REFERENCES `comments` (`id`),
CONSTRAINT `comment_views_user_id_foreign` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=38821916 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
By specifying a composite key on (comment_id
, user_id
), and a non-composite key on user_id
, I am covered to search by comment_id
using the leftmost prefix rule. However, if there was an explicit key on just comment_id
, would lookups that just specify comment_id
(i.e. SELECT * FROM comment_views WHERE comment_id = 123
) be faster?
Does that answer change if the composite key wasn't unique, the column is a string, or if the column is nullable? How about if I am instead using a range, join, dependent subquery, covering index query, or some other index access?
I tried running EXPLAIN with and without the extra index, and the explain didn't change other than the possible_keys
showing my new index. In fact, it seemed like the EXPLAIN preferred the composite key. Maybe because it was first in the table definition or it just prefers a unique key (even though the leftmost prefix isn't unique). I don't know. I was unable to find more information on how MySQL actually achieves using the leftmost prefix under the hood.
The difference is so minor as to be ignorable.
Debating points:
INDEX(col1)
will probably never be used, but INDEX(col1, col2)
could be quite useful.Other notes:
BIGINT
needed. For example, might you really have 2 billion users?id
really needed? Or would PRIMARY KEY(user_id, comment_id)
suffice? (Or the opposite order.) That may help for several reasons.FOREIGN KEY
will create an INDEX
if needed. So, create the composite indexes before declaring the FKs.