I have a simple quick select algorithm and want to understand why it doesn't work sometimes.
The question is to find Top K Frequent Elements
. I know there are other ways to do it such as using heap or bucket sorting. But I am curios why my algorithm doesn't work as I know there are other ways to do quick select which work. It sometimes doesn't work due to the random.choice
function which means any of the following can happen:
Why? Isn't quick select based on random function, how come other quick select algorithms that use Python functions for randomness work but this one doesn't. what do I miss?
inputs are a list of numbers as nums
and an integer k
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
histogram = Counter(nums) """ nums' frequency histogram/map"""
def quickSelect(keys, k, ri = []):
r = random.choice(keys) """ find the random key"""
pivot = histogram[r] """ get the random frequency based on the random key"""
left, right = [], [] """ put the keys in either right or left list based on their values"""
for key in keys:
if histogram[key] >= pivot:
right.append(key)
else:
left.append(key)
if k == len(right) + len(ri): """ if the size of right plus previous right (ri) is equal to k we found it"""
return right + ri
elif k < len(right) + len(ri):
quickSelect(right, k, ri)
else:
quickSelect(left, k - len(right) - len(ri), right+ri)
return quickSelect(list(histogram.keys()), k, [])
Update
it works per the accepted response suggestion. Added other way of solving it as comments as well for those who are interested.
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
####heap in O(nlogk)
# histogram = Counter(nums)
# return heapq.nlargest(k, histogram.keys(), key = histogram.get)
####bucket sort in O(n)
# bucket = [[] for _ in range(len(nums)+1)]
# histogram = Counter(nums)
# for n, f in histogram.items():
# bucket[f].append(n)
# res = []
# for i in range(len(bucket)-1, -1, -1):
# if bucket[i]:
# for n in bucket[i]:
# res.append(n)
# if len(res) == k:
# return res
# return None
#### quick select
histogram = Counter(nums)
def quickSelect(keys, k):
pivot = histogram[random.choice(keys)]
left, right = [], []
for key in keys:
if histogram[key] >= pivot:
right.append(key)
else:
left.append(key)
if k == len(right):
return right
elif k < len(right):
return quickSelect(right, k)
else:
return quickSelect(left, k - len(right)) + right
return quickSelect(list(histogram.keys()), k)
quickSelect(keys, k, ri = [])
does not explicitly return a value when it recurses - the return value will be None.
You don't need ri
: manipulate k
.
The same approach preventing a decent quicksort implementation from ever needing O(n) levels of recursion would work for quickSelect:
Do not recurse on the larger partition.
But check if then you need to recurse at all.