binary-treebig-ocomplexity-theory

Complexity of Perfectly Balanced Binary Tree


My scenario is a perfectly balanced binary tree containing integers.

I've searched and found many explanations of best/worst case scenarios for binary trees. Best case is O(1) (target found in root), and worst is O(log(n)) (height of the tree).

I have found little to no information on calculating average complexity. The best answer I could find was O(log(n)) - 1, but I guess I don't quite understand (if correct) how this average case is calculated.

Also, would searching for an integer not in the tree yield the same complexity ? I think it would, but any insight is appreciated.


Solution

  • Lets say we have a perfect balanced binaray tree containing n = 2k integers, so the depth is log₂(n) = k.

    The best and worst case is, as you say, O(1) and O(log(n)).

    Short way

    Lets pick a random integer X (uniform distributed) from the binary tree. The last row the tree contains the same number of integers as the first k-1 rows together. With probability 1/2 X is in the frist k-1 rows, so we need at most O(k-1) = O(log(n)-1) steps to find it. And also with probability 1/2 X is in the last row, where we need O(k) = O(log(n)) steps.

    In total we get

    E[X] ≤ P(row of X ≤ k-1)⋅O(log(n)-1) + P(row of X = k)⋅O(log(n)) 
         = 1/2⋅O(log(n)-1) + 1/2⋅O(log(n))
         = 1/2⋅O(log(n)-1) + 1/2⋅O(log(n)-1)
         = O(log(n)-1)

    Notice: This is a little ugly but in O-notation O(x) and O(x±c) is the same for any constant value c.

    Long way

    Now lets try to calculate the average case for a random (uniform distributed) integer X containd in the tree and lets name the set of integers on the i-th "row" of the tree Ti. Ti contains 2i Elements. T0 denotes the root.

    The probability of picking an integer in the i-th row is P(X ∈ Ti) = 2i/n = 2i-k.

    To find an integer on row i it take O(2i) = O(i) steps.

    So the expected number of steps is

    E[X] = Σi=0,...,k-1 O(i)⋅2i-k.

    To simplify this we use

    O(i)⋅2i-k + O(i+1)⋅2i+1-k ≤ O(i)⋅2i+1-k + O(i+1)⋅2i+1-k ≤ O(i+1)⋅2i+2-k

    This leads us to

    E[X] = Σi=0,...,k-1 O(i)⋅2i-k ≤ O(k-1)⋅2⁰

    Since k = log(n), we see that the average case is in O(log(n)-1) = O(log(n)).

    Values not in the tree

    If the value is not in the tree, you have to walk through the whole tree. After log(n) steps you have found a leaf. If the value equals your input, you have found what you seached for. If not, you know, that the value you searched for is not containd in the tree. So if you seach for a value that is not in the tree it will take O(log(n)).