mongodbmongodb-queryb-treemongodb-indexesb-tree-index

How to estimate the average base of the logarithm for a MongoDB index lookup?


I’m trying to understand how the O(log N) complexity of a MongoDB index lookup translates to actual performance. MongoDB uses B-trees (B+trees) for indexes, and the logarithm’s base corresponds to the fanout (number of keys per node).

  1. I want to estimate the average base of the logarithm for a given index. Specifically:

  2. How can I determine or approximate the typical number of keys per index node in MongoDB?

  3. Given the number of entries in an index (e.g., 10 million), how can I use this to estimate the height of the B-tree or the number of node accesses needed for a lookup?

Are there practical rules of thumb or formulas to approximate the “effective base” of the log for different types of fields (integers, short strings, compound indexes)?

I’m looking for a way to reason about log(N) in MongoDB indexes more realistically, rather than just assuming a binary tree with base 2.

Thanks for any insights!


Solution

  • Let's assume that the B-tree has, on average, a entries in each node, and b child nodes per node:

    enter image description here

    In that case, the number of entries in the tree follows a finite geometric series with a ratio of b and the first term equalling a:

    N = a + ab + ab2 + ab3 + ... + abn

    = a(1-bn+1)/(1-b)

    N(1-b)/a = 1-bn+1

    bn+1 = 1 - N(1-b)/a

    n = logb(1 - N(1-b)/a) - 1

    Source: https://www.geeksforgeeks.org/dsa/b-tree-in-python/