I’m trying to understand how the O(log N) complexity of a MongoDB index lookup translates to actual performance. MongoDB uses B-trees (B+trees) for indexes, and the logarithm’s base corresponds to the fanout (number of keys per node).
I want to estimate the average base of the logarithm for a given index. Specifically:
How can I determine or approximate the typical number of keys per index node in MongoDB?
Given the number of entries in an index (e.g., 10 million), how can I use this to estimate the height of the B-tree or the number of node accesses needed for a lookup?
Are there practical rules of thumb or formulas to approximate the “effective base” of the log for different types of fields (integers, short strings, compound indexes)?
I’m looking for a way to reason about log(N) in MongoDB indexes more realistically, rather than just assuming a binary tree with base 2.
Thanks for any insights!
Let's assume that the B-tree has, on average, a entries in each node, and b child nodes per node:
In that case, the number of entries in the tree follows a finite geometric series with a ratio of b and the first term equalling a:
N = a + ab + ab2 + ab3 + ... + abn
= a(1-bn+1)/(1-b)
N(1-b)/a = 1-bn+1
bn+1 = 1 - N(1-b)/a
n = logb(1 - N(1-b)/a) - 1