pythonalgorithmsortingdata-structuresdefaultdict

Understanding Time complexity of nested sorting


I am solving this LeetCode problem 1636. Sort Array by Increasing Frequency:

Given an array of integers nums, sort the array in increasing order based on the frequency of the values. If multiple values have the same frequency, sort them in decreasing order.

Return the sorted array.

I wrote the following working code:

class Solution:
    def frequencySort(self, nums: List[int]) -> List[int]:
        ddict = Counter(nums)
        ddict = dict(sorted(ddict.items(), key=lambda x:(x[1], x[0])))

        defDict = defaultdict(list)
        res = []
        for k, v in ddict.items():
            defDict[v].append(k)
        
        del(ddict)
        for k, v in defDict.items():
            v.sort(reverse=True)
            for val in v:
                for _ in range(k):
                    res.append(val)

        return res

I think it has a time complexity of O(n.(nlog(n)), because I am sorting each list in defaultdict for every key in the worst case.

But the time complexity analysis in LeetCode as well as AI tools like chatGPT and Perplexity AI find the time complexity to be O(nlog(n)). I am confused. Can anyone please help me understand why it's not O(n.(nlog(n))?


Solution

  • Sorting distinct chunks of data will still amount to O(𝑛log𝑛). For instance, let's say you have 𝑘 keys in defDict, where each has a list (v) that has an average length of 𝑛/𝑘, then sorting each of these v will amount to a complexity of O(𝑘 (𝑛/𝑘)log(𝑛/𝑘)), which is O(𝑛log(𝑛/𝑘)), which is not worse than O(𝑛log𝑛).

    Note that your v is already sorted in ascending order before the call to v.sort is made, and you could just have reversed v. As Python's native sort (timsort) deals well with such input, it will run in O(𝑘) where 𝑘 is the size of v, which means this part of your algorithm represents O(𝑛) as time complexity, making the initial sort the determining step for what concerns the overall time complexity: O(𝑛log𝑛).