pythonquicksort

Improving quick sort to work with lots of duplicates in the array, using "randomized swaps"


If the array of size n contains just one unique number (an array with all duplicates), then the traditional quick sort algorithm will create very uneven partitions of sizes 1 and n-1. This will be the case no matter how we choose the pivot element. This would make the average case time complexity O(n^2) for such inputs.

I thought of a simple way to alleviate this problem. While partitioning the array, if the element is exactly equal to the pivot element, we can equi-probably (randomly through a coin toss) choose to treat it as "less than" or "greater than" the pivot, so that duplicates get evenly distributed around the pivot.

def partition(arr: list[int]):
  pivot = arr[-1]
  current_index = 0
  for i in range(len(arr) - 1):
    if arr[i] < pivot or (arr[i] == pivot and random.random() < 0.5):
      arr[i], arr[current_index] = arr[current_index], arr[i]
      current_index += 1
  # put the pivot in its place
  arr[-1], arr[current_index] = arr[current_index], arr[-1]
  # return the partition index
  return current_index

(Emphasis on the arr[i] == pivot and random.random() < 0.5 check)

But I have never seen this approach being used anywhere. Is there a problem with this approach that it's not widely used?


Solution

  • If the array of size n contains just one unique number (an array with all duplicates), then the traditional quick sort algorithm will create very uneven partitions of sizes 1 and n-1.

    This is only true if using Lomuto partition scheme. If using Hoare partition scheme (the original quicksort scheme), the splits become ideal as the number of duplicates increases, although it needlessly swaps equal elements. Look at the repeated elements section in the Wiki article:

    https://en.wikipedia.org/wiki/Quicksort