I'm learning searching and I want to know the average successful search for both linear and BST when each element is NOT equally likely to be searched. However, I haven't had much luck finding what to do in such cases :(
For example, I have a list of 3 elements (5, 4, 8) with probability of each element being searched is (0.1, 0.05, 0.05) respectively. What is the average number of elements inspected for a successful search if linear search or BST is performed?
The setup is that element i
has a value v[i]
, a probability p[i]
and a search for that element will do c[i]
comparisons.
In your example we are given
v = [5, 4, 8]
p = [0.1, 0.05, 0.05]
If you attempt a linear search, the number of comparisons will be
c = [1, 2, 3]
What you do is weight each count by the likelihood of doing it, divided by the overall likelihood of a successful search. That is:
print(sum([p[i] * c[i] for i in range(len(c))]) / sum(p))
In your case this comes out to 1.75
.
If you were to switch to a binary search, the counts change to c = [1, 2, 2]
and the calculation instead comes out to 1.5
. (Which makes sense, you have even odds of picking the root or a leaf.)
If you want to include the possibility of unsuccessful searches, then you just add counts and probabilities for the unsuccessful searches. So for linear you might get:
p = [0.1, 0.05, 0.05, 0.8]
c = [1, 2, 3, 3]
and now your answer is 2.75
. That's because you're mostly doing unsuccessful searches, which take 3
.
For an unbalanced binary search it gets more complicated because you have to include each missing range with a probability. But the principle remains the same.