pythonsortingtime-complexitybubble-sort

How to properly count operations (time complexity) in a bubble sort algorithm in Python


This is my code:

def sort_bubble(list):
    comp_counter = 0
    swap_counter = 0
    n = len(list)
    for i in range(n-1):
        for j in range(n-i-1):
            comp_counter += 1
            if list[j] > list[j+1]:
                swap_counter += 1
                list[j], list[j+1] = list[j+1], list[j]
    return comp_counter, swap_counter

It needs to bubble sort a list, and then return the proper amount of operations (time complexity).

I ran this code in two cases (each list had 20000 intergers):

a) Sorting an already sorted list (best case scenario): Number of comparsions = 49995000 = n(n-1)/2; Number of swaps = 0. Seems to me like something is wrong here, because number of comparsions in a best case scenario should be O(n) and the number of swaps O(1) (for a bubble sort algorithm).

b) Sorting a "backwards" sorted list (worst case scenario) : Number of comparsions = 49995000 = n(n-1)/2; Number of swaps = 49993762. In a worst case scenario this algorithm should have O(n^2) swaps and O(n^2) comparsions. Meanwhile the number of swaps and comparsions is something closer to O(n^2)/2

It is obvious to me that either my code, or my understanding of time complexity in algorithms is completely scuffed. Probably both. What am I doing wrong?


Solution

  • return the proper amount of operations (time complexity).

    The number of operations you get from running a program are not equivalent to its time complexity. Time complexity is something that you derive from an analysis that proves that this number of operations is bounded by a function of 𝑛 for all great enough values of 𝑛.

    best case scenario ... worst case scenario: Number of comparsions = 49995000 = n(n-1)/2

    The comparison counter is increased in a way that does not depend on the contents of the input list. Its final value only depends on 𝑛. And so there is no best or worst case scenario for it. It will always end up as 𝑛(𝑛−1)/2.

    If you expected to see a difference, then this means you were supposed to implement a version of bubble sort that detects that the input list is sorted and can exit earlier. See also Wikipedia on bubble sort, and how in one version it uses a swapped boolean variable, and in another n and newn. That latter version identifies in a given iteration of the outer loop where the last swap occurs, and will conclude that all values from that index onwards are in their final, sorted position and do not need to be compared again.

    Unrelated, but don't call your list list, as that is the name of a native Python function.

    Here is how your code could be adapted:

    def sort_bubble(lst):
        comp_counter = 0
        swap_counter = 0
        n = len(lst)
        sorted_from = n
        while sorted_from > 0:
            last_swap_index = 0
            for j in range(sorted_from - 1):
                comp_counter += 1
                if lst[j] > lst[j+1]:
                    swap_counter += 1
                    lst[j], lst[j+1] = lst[j+1], lst[j]
                    last_swap_index = j + 1
            sorted_from = last_swap_index
        return comp_counter, swap_counter
    

    For the best case scenario you'll now get a number of comparisons that is the minimum possible.

    Time complexity

    Time complexity is something you get from an analysis. Here we can prove that the number of comparisons is θ(𝑛) in the best case and θ(𝑛²) in the worst case. The number of swaps has a best case of θ(1) and a worst case of θ(𝑛²).

    Meanwhile the number of swaps and comparisons is something closer to O(n^2)/2

    This confirms the misunderstanding of what time complexity is. O(𝑛²)/2 -- or better put, O(𝑛²/2) -- is the same as O(𝑛²). You can find more information about this at Wikipedia on Big O notation.