algorithmbinary-search-treelinear-search

How to find average successful search when elements are NOT equally likely to be searched (linear and BST)


I'm learning searching and I want to know the average successful search for both linear and BST when each element is NOT equally likely to be searched. However, I haven't had much luck finding what to do in such cases :(

For example, I have a list of 3 elements (5, 4, 8) with probability of each element being searched is (0.1, 0.05, 0.05) respectively. What is the average number of elements inspected for a successful search if linear search or BST is performed?


Solution

  • The setup is that element i has a value v[i], a probability p[i] and a search for that element will do c[i] comparisons.

    In your example we are given

    v = [5, 4, 8]
    p = [0.1, 0.05, 0.05]
    

    If you attempt a linear search, the number of comparisons will be

    c = [1, 2, 3]
    

    What you do is weight each count by the likelihood of doing it, divided by the overall likelihood of a successful search. That is:

    print(sum([p[i] * c[i] for i in range(len(c))]) / sum(p))
    

    In your case this comes out to 1.75.

    If you were to switch to a binary search, the counts change to c = [1, 2, 2] and the calculation instead comes out to 1.5. (Which makes sense, you have even odds of picking the root or a leaf.)

    If you want to include the possibility of unsuccessful searches, then you just add counts and probabilities for the unsuccessful searches. So for linear you might get:

    p = [0.1, 0.05, 0.05, 0.8]
    c = [1, 2, 3, 3]
    

    and now your answer is 2.75. That's because you're mostly doing unsuccessful searches, which take 3.

    For an unbalanced binary search it gets more complicated because you have to include each missing range with a probability. But the principle remains the same.