pythonalgorithmdictionarygreedyanagram

Most optimal solution O(1) to question: given a single word, return a list of anagrams present in the stream/list of words


I received this question in an interview, and coded out a solution, but it was not optimal.

Given a stream of words such as:

army, ramy, cat, eat, tea....

How can you store these words to support the following query:
Given a word return list of anagrams present in the stream

Implement Methods:

public void storeWords(String[] words);
public String[] getAnagrams(String word);

e.g.

  • getAnagrams("army") would return ["army", "ramy"]
  • getAnagrams("tac") would return ["cat"]

I need it to have a O(1) time complexity for lookup of getAnagrams(), which means storeWords() needs to store the anagrams in a way that looking it up would not require a loop. Currently, my solution is running on O(n) time, since I am using a loop. I'm not sure how to go about optimizing this. I was thinking of maybe using a Trie, but I don't know how that would work to give me a O(1) solution

My solution is this:

  1. create an anagram_map that takes in key: sum of their unicode numbers and value: list of words with that unicode sum
    • ex. cat would be key: ord(c) + ord(a) + ord(t) and value : [cat]
  2. getAnagrams would grab the list of possible anagrams from the sum of the unicode numbers from the word passed in. I then have a helper function of isAnagram that checks if the word is an anagram of our given word
  3. isAnagram has a map that counts the characters for the anagram and word. if everything in the map has a count of 0, it is an anagram
  4. append that to a list that we return

My code is below:

from collections import defaultdict
class Anagram:
    def __init__(self):
        self.anagram_map = defaultdict(list)
    
    def storeWords(self, words):
        for word in words:
            unicode_sum = 0

            for c in word:
                unicode_sum += ord(c)

            self.anagram_map[unicode_sum].append(word)

    def getAnagrams(self, word):
        unicode_sum = 0
        res = []
        
        for c in word:
            unicode_sum += ord(c)
            
        anagrams = self.anagram_map[unicode_sum]
        
        for anagram in anagrams:
            if self.isAnagram(anagram.word):
                res.append(anagram)
        
        return res
    
    def isAnagram(self, anagram, word):
        count_map = {}
        
        for c in anagram:
            if c in count_map:
                count_map[c] += 1
            else:
                count_map[c] = 1
        
        for w in word:
            if w in count_map:
                count_map[w] -= 1
            else:
                return False
        
        for count in count_map.values():
            if count != 0:
                return False
        
        return True

anagram = Anagram()

stream = ['army', 'ramy', 'cat', 'eat','tea']

anagram.storeWords(stream)

print(anagram.getAnagrams('army'))

print(anagram.getAnagrams('tac'))

Does anyone know how I can optimize this?


Solution

  • To detect anagrams, using ascii values to compute a hash is not a strong way to get unique hashes and would require separate handling for collisions. For example, below 2 strings will have the same ascii sum:

    abd -> 295
    bcb -> 295
    

    Instead, you can sort the characters of the string and use this sorted state as the dict key and store all the words that have the same sorted state since all anagrams will have the same state when sorted.

    abc, cab, bca, acb --> abc(sorted state)
    

    This way, you can get all the anagrams for the given word in O(1) time on an average for most of the testcases.

    Snippet:

    class Anagram:
        def __init__(self):
            self.anagram_map = dict()
        
        def storeWords(self, words):
            for word in words:
                sortW = ''.join(sorted(word))
                self.anagram_map[sortW] = self.anagram_map.get(sortW, [])
                self.anagram_map[sortW].append(word)
    
        def getAnagrams(self, word):
           return self.anagram_map.get(''.join(sorted(word)), [])
    

    Live Demo