I have a 'comments' table with an int auto_increment primary key and references to author and post (author_id, post_id).
Since i very often join with the 'posts' table for obvious reasons and rarely with authors (just in the user history, rarely used) I'm considering to remove the primary key status on 'comment_id' and create a composite primary key like (post_id, comment_id) with the idea to be able to fetch very fast the comments since they're now stored ordered thanks to the leading primary key.
notes:
Hope to receive some interesting ideas.
If your priority is to optimize joins from the posts
table to the comments
table, then it could be a small advantage to make the primary key of comments
have a leading post_id
.
CREATE TABLE comments (
post_id INT,
author_id INT,
-- presumably some other columns for comment content, date, etc.
PRIMARY KEY (post_id, author_id),
FOREIGN KEY (post_id) REFERENCES posts(post_id),
FOREIGN KEY (author_id) REFERENCES authors(author_id)
);
When a join looks up the row in comments
by post_id
, it can use the primary key index, i.e. the clustered index, only if post_id
is the leading column of the primary key.
If you had the columns of the composite primary key reversed, you could still do an efficient lookup by post_id
if that column were indexed. Any column in a foreign key constraint in InnoDB is automatically indexed.
CREATE TABLE comments (
author_id INT,
post_id INT,
-- presumably some other columns for comment content, date, etc.
PRIMARY KEY (author_id, post_id),
FOREIGN KEY (post_id) REFERENCES posts(post_id), -- creates an index on post_id
FOREIGN KEY (author_id) REFERENCES authors(author_id) -- PK is the index; no extra index is created
);
A lookup by post_id
in this latter case is almost as good. It finds index entries for a given post_id
together in the secondary index. Then if your query requires other columns not in that index, it'll have to follow the PK reference and load those pages. But only for the subset of entries matched by the post_id
index, so this overhead is small.
Honestly, you're in the territory of micro-optimizations. It's good to be mindful of opportunities for optimization, but keep in mind some optimizations, while they give greater than zero benefit is the strict sense, that benefit might be so small for your app that they're not worth making the change.
How can you know whether it's worthwhile? Try both designs in a test environment. Measure performance, using a query and data quantity that's expected for your application. Then compare the results of the performance measurements.
The difference in performance may be small enough that it's insignificant for your app.
For example, suppose the query for comments takes 18 milliseconds in one implementation, and 22 milliseconds with the other implementation. Your page load requirement is that it be under 100 milliseconds. In that case, both implementations satisfy the requirement, and changing one implementation for the other doesn't make a material difference.