[SOLVED] How to estimate the average base of the logarithm for a MongoDB index lookup?

How to estimate the average base of the logarithm for a MongoDB index lookup?

I’m trying to understand how the O(log N) complexity of a MongoDB index lookup translates to actual performance. MongoDB uses B-trees (B+trees) for indexes, and the logarithm’s base corresponds to the fanout (number of keys per node).

I want to estimate the average base of the logarithm for a given index. Specifically:
How can I determine or approximate the typical number of keys per index node in MongoDB?
Given the number of entries in an index (e.g., 10 million), how can I use this to estimate the height of the B-tree or the number of node accesses needed for a lookup?

Are there practical rules of thumb or formulas to approximate the “effective base” of the log for different types of fields (integers, short strings, compound indexes)?

I’m looking for a way to reason about log(N) in MongoDB indexes more realistically, rather than just assuming a binary tree with base 2.

Thanks for any insights!

Solution

Let's assume that the B-tree has, on average, a entries in each node, and b child nodes per node:

In that case, the number of entries in the tree follows a finite geometric series with a ratio of b and the first term equalling a:

N = a + ab + ab² + ab³ + ... + abⁿ

= a(1-bⁿ⁺¹)/(1-b)

N(1-b)/a = 1-bⁿ⁺¹

bⁿ⁺¹ = 1 - N(1-b)/a

n = log_b(1 - N(1-b)/a) - 1

Source: https://www.geeksforgeeks.org/dsa/b-tree-in-python/