algorithmrandomsequenceprobabilityrandomized-algorithm

Runtime of randomization algorithm to find majority element in an array?


This is for the leetcode problem 169. Majority Element. Given an array of numbers, where there is a guarantee that there is a number that exists in the array greater than floor(n/2), one can find such a number by selecting a random index and checking if it's value is the majority element or not, and repeat until it's found. This is one of the solutions provided.

class Solution:
    def majorityElement(self, nums: List[int]) -> int:
        majority_count = len(nums)//2

        while True:
            candidate = random.choice(nums)
            if sum(1 for elem in nums if elem == candidate) > majority_count:
                return candidate

What I'm confused about is the runtime analysis of this algorithm, which is based on the assertion that since the majority element is always guaranteed to occupy more than half of the array, this algorithm applied to our scenario will always be faster than if it were applied to the scenario where there existed only an element that occupied exactly half the array. Therefore the expected number of iterations EV_(itersprob) can be represented as being less than or equal to the expected number of iterations of the modified scenario EV_(itersmod)

enter image description here

What I'm confused about is how this series was put together for this problem. The variable i is the iteration number, but why are you summing for (i * (1/2^i))?

Basically the runtime for the number of iterations of the while loop is expected to be constant, making the runtime of this entire algorithm linear (due to the inner loop running n times each time)


Solution

  • The probability that we guess the correct element on the first run is p = 1/2.

    The probability that we guess the correct element on the second run is: we guessed wrong on the first run, but right on the second run. So, q * p = 1/2 * 1/2 = 1/4.

    The probability that we guess the correct element on the third run is: we guessed wrong on the first run, wrong on the second run, but right on the third run. So, q * q * p = 1/2 * 1/2 * 1/2 = 1/8.

    So, the expected number of runs is: 1 * 1/2 + 2 * 1/4 + 3 * 1/8 + .... Exactly like the formula says.

    After that, we can intuitively note that, if the probability p > 1/2, the expected time will be even less.