I’m given two string arrays, words1 and words2. A string b is a subset of string a if all characters in b appear in a with at least the same frequency. I need to find all strings in words1 that are universal, meaning every string in words2 is a subset of them. I need to return those universal strings from words1 in any order.
I've written a function to find universal words from two lists, words1 and words2. The function uses a hashmap to store character frequencies from words2 and then checks each word in words1 against these frequencies. The code passed all the test cases but I am confused about analyzing time complexity.
For time complexity:
Processing all words in words2 takes O(m * k), where m = length of words2 and k = average word length. Checking all words in words1 involves O(n * 26), where n = length of words1, and 26 is the max number of hashmap keys (a-z).
If I consider the count() in both the loops it will make whole time complexity O(m * k^2 + n*k). Is this the correct analysis?
def wordSubsets(self, words1: List[str], words2: List[str]) -> List[str]:
ans = []
hmap = {}
for i in words2:
for j in i:
if j in hmap.keys():
hmap[j] = max(hmap[j], i.count(j))
else:
hmap[j] = i.count(j)
for j in words1:
flag = True
for k in hmap.keys():
if hmap[k] > j.count(k):
flag = False
break
if flag:
ans.append(j)
return ans
Indeed, the call of count
represents O(𝑘) time complexity, as it needs to scan all letters in the given word. So this makes the first loop's time complexity O(𝑚𝑘²).
Similarly, the second loop has a complexity of O(26𝑛𝑘) = O(𝑛𝑘), if indeed your assumption about the range of the characters (a-z) is correct.
You can avoid using count
and so reduce the complexity of the first loop, by just increasing a count by 1 as you iterate the letters.
For instance, like this:
def wordSubsets(self, words1: List[str], words2: List[str]) -> List[str]:
# A helper function to count the frequency of each letter in a single string
def get_freq(word): # O(k)
hmap = {}
for ch in word:
hmap[ch] = hmap.get(ch, 0) + 1
return hmap
hmap = {}
# O(mk):
for hmap2 in map(get_freq, words2):
for ch, freq in hmap2.items():
hmap[ch] = max(hmap.get(ch, 0), freq)
res = []
# O(nk)
for hmap1, word1 in zip(map(get_freq, words1), words1):
if all(hmap1.get(ch, 0) >= freq2 for ch, freq2 in hmap.items()):
res.append(word1)
return res
So now the complexity is O((𝑛+𝑚)𝑘).
We can further adapt and make use of Counter
from the collections
module, and make use of more comprehension-syntax:
def wordSubsets(self, words1: List[str], words2: List[str]) -> List[str]:
hmap = Counter()
for word2 in words2:
hmap |= Counter(word2)
return [word1 for word1 in words1 if Counter(word1) >= hmap]
The time complexity remains the same.