I am trying to figure out the QuickSort algorithm. I have tried different implementations. The problem I have relates to the inclusion/exclusion of a pivot element.
Consider these three implementations.
This code would cause an error in VS code and VS code just stops. Notice that the pivot element (ind
) is included only once in left
(at least it seems this way to me) and it is not added separately in the last line of the function:
def quick_sort(arr):
if len(arr)<=1:
return arr
ind = arr[0]
left = [i for i in arr if i <= ind]
right = [i for i in arr if i > ind]
return quick_sort(left) + quick_sort(right)
If I exclude ind
from left
(using <
instead of <=
) and I add [ind]
in the last line, it works. BUT the sorted list won't have any duplicates from the original list:
numbers = [5, 2, 3, 2, 7]
print("Sum of numbers:", quick_sort(numbers))
def quick_sort(arr):
if len(arr)<=1:
return arr
ind = arr[0]
left = [i for i in arr if i < ind]
right = [i for i in arr if i > ind]
return quick_sort(left) + [ind] + quick_sort(right)
Finally, this implementation works just fine sorting all values properly. BUT I don't understand what's the difference between this implementation and the first one. Why the first one causes an error, while this one works? Yet in both cases we include the pivot (ind
) just once?
numbers = [5, 2, 3, 2, 7]
print("Sum of numbers:", quick_sort(numbers))
def quick_sort(arr):
if len(arr)<=1:
return arr
ind = arr[0]
left = [i for i in arr if i < ind]
middle = [i for i in arr if i == ind]
right = [i for i in arr if i > ind]
return quick_sort(left) + middle + quick_sort(right)
I checked the previous Q&A on this topic, but didn't find anything on this problem.
Implementation #1 will suffer from infinite recursion when left
takes all values in the given list (and right
remains empty). In that case there is no progress and the recursive call will solve the exact same problem again, leading to the same repeated recursive call...etc. This happens for instance with input [2, 1].
You already correctly identified the problem with the second implementation. If the pivot value has duplicates, they are removed from the result. This problem is solved by the third implementation.
Both the second and third implementation solve the problem that the first implementation has: it is guaranteed that the recursive calls will deal with strictly shorter lists, because it is guaranteed that one or all pivot value(s) are no longer member of the list that is passed to the recursive call.
Unrelated to your question, but quicksort is commonly implemented as an inplace algorithm, i.e. without the creation of new lists during the process. Instead the given list is mutated through swaps. See Wikipedia on Quicksort.