I'm trying to write my own hoare partition function to better understand it. I thought I followed its definition and pseudo-code well but even though it seems to work as expected on many occasions, it falls apart and goes into infinite loop when a list with multiple values equal to pivot are passed in. Where is my mistake, how should I modify this to fix the bug?
def partition(lst, left_index, right_index):
pivot = lst[right_index]
while True:
#Increment left index until value at that index is greater or equal to pivot
while True:
if lst[left_index] >= pivot: break
left_index += 1
#Increment right index until value at that index is less or equal to pivot
while True:
if lst[right_index] <= pivot: break
right_index -= 1
if left_index < right_index:
lst[left_index], lst[right_index] = lst[right_index], lst[left_index]
else:
return right_index
return partition(0, end)
You are testing for equality with pivot value in both lst[left_index] >= pivot
and lst[right_index] <= pivot
. This effectively prevents both indices from skipping over pivot-valued elements. Therefore, when two or more pivot-valued elements are pushed toward the middle of your list, left_index
and right_index
get separated by an unsurmountable barrier. Delete the equal sign in one of those break
ing lines, and the non-halting problem will be gone.
However, as a result of this change the loop that moves left_index
may push it by one position above right_index
and even go out of bounds when right_index
stays at its initial position. Similarly, the loop that moves right_index
may push it by one position below left_index
and even go out of bounds when left_index
stays at its initial position. To prevent that from happening you must change while True:
in those loops to while left_index < right_index:
.
Note that the partitioning will be slightly different depending on whether you remove the equality check for left_index
or right_index
. It matters for the boundary cases, when the pivot element turns out to be the smallest or the largest value in the list. Taking into account that in the beginning right_index
represents an internal position with respect to the input range and left_index
points to a boundary position, you must allow left_index
to skip over pivot values whereas right_index
must be instructed to stop at pivot values (the opposite logic would enable left_index
staying at its initial position till the end of the algorithm, while right_index
could be pushed all the way down to left_index
, producing no partitioning whatsoever).
Thus the corrected code will be this:
def partition(lst, left_index, right_index):
pivot = lst[right_index]
while True:
while left_index < right_index and lst[left_index] <= pivot:
left_index += 1
while left_index < right_index and lst[right_index] > pivot:
right_index -= 1
if left_index < right_index:
lst[left_index], lst[right_index] = lst[right_index], lst[left_index]
else:
return right_index