algorithmtime-complexitybig-oheapsort

Why is the big O-notation for the heap sort algorithm O(n logn)?


The height of the binary tree keeps reducing every time we remove a root element. Why do we use n (the total number of elements) times logn (the amount of swaps every time we remove a root element) to calculate the total time complexity while the amount of swaps actually varies depending on how many elements remain?

It seems like the correct expression of the time complexity should be the sum of swaps happened for every iteration of root element removal. The amount of swaps would be log(i). i is the remaining elements, and log(i) is the binary tree depth/time complexity/amount of swaps for the removal of each element. i ranges from 1 to n. The sum from log(1) to log(n) would just be O(log(n!)), which is smaller than the expected complexity O(nlog(n)). I would really appreciate it if anyone can explain this.


Solution

  • There are at least two ways to look at this:

    A complete binary tree has O(𝑛) leaves.

    Imagine a heap with 31 elements. This represents a perfect binary tree -- the bottom level is full. The bottom level has 16 elements, (𝑛+1)/2. That means in the worst case the next 16 deletions will represent the same number of swaps.

    Generalising, to remove about half of the nodes, we need to perform O(𝑛/2 log𝑛) swaps, i.e. O(𝑛log𝑛)

    O(log𝑛!) = O(𝑛log𝑛)

    See this Q&A for a proof.