pythonalgorithmtime-complexity

Is My Time Complexity Analysis for Finding Universal Words O(m * k^2 + n*k) correct?


I’m given two string arrays, words1 and words2. A string b is a subset of string a if all characters in b appear in a with at least the same frequency. I need to find all strings in words1 that are universal, meaning every string in words2 is a subset of them. I need to return those universal strings from words1 in any order.

I've written a function to find universal words from two lists, words1 and words2. The function uses a hashmap to store character frequencies from words2 and then checks each word in words1 against these frequencies. The code passed all the test cases but I am confused about analyzing time complexity.

For time complexity:

Processing all words in words2 takes O(m * k), where m = length of words2 and k = average word length. Checking all words in words1 involves O(n * 26), where n = length of words1, and 26 is the max number of hashmap keys (a-z).

If I consider the count() in both the loops it will make whole time complexity O(m * k^2 + n*k). Is this the correct analysis?

def wordSubsets(self, words1: List[str], words2: List[str]) -> List[str]:
    ans = []
    hmap = {}

    for i in words2:
        for j in i:
            if j in hmap.keys():
                hmap[j] = max(hmap[j], i.count(j))
            else:
                hmap[j] = i.count(j)

    for j in words1:
        flag = True
        for k in hmap.keys():
            if hmap[k] > j.count(k):
                flag = False
                break

        if flag:
            ans.append(j)

    return ans

Solution

  • Indeed, the call of count represents O(𝑘) time complexity, as it needs to scan all letters in the given word. So this makes the first loop's time complexity O(𝑚𝑘²).

    Similarly, the second loop has a complexity of O(26𝑛𝑘) = O(𝑛𝑘), if indeed your assumption about the range of the characters (a-z) is correct.

    You can avoid using count and so reduce the complexity of the first loop, by just increasing a count by 1 as you iterate the letters.

    For instance, like this:

    def wordSubsets(self, words1: List[str], words2: List[str]) -> List[str]:
        # A helper function to count the frequency of each letter in a single string
        def get_freq(word):  # O(k)
            hmap = {}
            for ch in word:
                hmap[ch] = hmap.get(ch, 0) + 1
            return hmap
    
        hmap = {}
        # O(mk):
        for hmap2 in map(get_freq, words2):
            for ch, freq in hmap2.items():
                hmap[ch] = max(hmap.get(ch, 0), freq)
        
        res = []
        # O(nk)
        for hmap1, word1 in zip(map(get_freq, words1), words1):
            if all(hmap1.get(ch, 0) >= freq2 for ch, freq2 in hmap.items()):
                res.append(word1)
                
        return res
    

    So now the complexity is O((𝑛+𝑚)𝑘).

    We can further adapt and make use of Counter from the collections module, and make use of more comprehension-syntax:

    def wordSubsets(self, words1: List[str], words2: List[str]) -> List[str]:
        hmap = Counter()
        for word2 in words2:
            hmap |= Counter(word2)
        
        return [word1 for word1 in words1 if Counter(word1) >= hmap]
    

    The time complexity remains the same.