Suppose I have three columns, A, B, C. They each have a range of x, y and z possible values respectively.
Does an index on all three columns have a size proportional to x * y * z?
No. The size of an INDEX
is (roughly)
N * L + overhead
N = Number of rows in the entire table.
L = Length (in bytes) of the values in all the columns of the index, plus columns in the PRIMARY KEY
.
overhead = various pointer, lengths, padding, etc
Example: CREATE TABLE ... id INT PRIMARY KEY, A INT, INDEX(A) ...
INT
is a 4-byte datatype. It can hold more than 4 billion distinct values. If there are 100 rows in the table, let's look at the BTree holding the secondary INDEX(A)
.
N = 100
L = 4 + 4 -- that bytes, not billions of bytes
N * L = 800, but once the overhead is added, and use the blocking, it will take 16KB. (Note: InnoDB allocates data and indexes in "blocks" of 16KB.)
Now add to that table
city VARCHAR(100), -- average length 10 characters
INDEX(city, A)
N = 100 -- still assuming 100 rows
L = (2+10) + 4 + 4 = 16
total = again, only 1-2 blocks.
The (2+10)
: 2 for the "length" of the string; 10, on average, for the actual string. (In some cases, the "2" is really "1" and if you are using utf8, each character could be multiple bytes.)
If that table grows to 1 million rows, the index may take 50MB, a lot of it being unavoidable "overhead".
A major exception:
For InnoDB, the size of the PRIMARY KEY
is virtually zero since it is "clustered" with the data. Actually, there is about 1% extra for the non-leaf nodes in that BTree and some 'overhead'.