pythonbinary-searchsortedlist

Find if any number appears more than n/4 times in a sorted array


I was asked the following in an interview:

Given a sorted array with n numbers (where n is a multiple of 4), return whether any number appears more than n/4 times.

My initial thought was to iterate over the array and keep a counter:

limit = len(nums) // 4
counter = 1

for i in range(1, len(nums)):
    if nums[i] == nums[i-1]:
        counter += 1

        if counter > limit:
            return True
    else:
        counter = 1

return False

However the interviewer asked if I could do better. Sorted array, better than linear complexity, I immediately thought "binary search"... I explored this path but was clueless.

After a few minutes, he gave a hint: "if any i exists where nums[i] == nums[i + len(nums) / 4]", does that mean we should return true?

I was trying to apply this to my binary search and was not thinking straight so the hint went over my head... Retrospectively it is trivial, however I only see how to apply this with a sliding window:

limit = len(nums) // 4

for i in range(limit, len(nums)):
    if nums[i] == nums[i-limit]:
        return True

return False

Is it what the interviewer meant? Or is there a better, non-linear solution?


Solution

  • A number that appears more than n/4 times in nums must be one of the elements of nums[::len(nums)//4]. For each candidate number, we perform a binary search for the first position where it appears, and check whether the same number appears len(nums)//4 spots later:

    import bisect
    def check(nums):
        n = len(nums)
        assert n % 4 == 0
    
        for i in nums[::n//4]:
            first_appearance = bisect.bisect_left(nums, i)
            if first_appearance + n//4 < len(nums) and nums[first_appearance + n//4] == i:
                return True
        return False
    

    We perform at most 4 binary searches, and at most a constant amount of other work, so the time complexity is O(log n).