pythonpython-3.xperformanceanagram

Anagram from large file


I have a file having 10,000 word on it. I wrote a program to find anagram word from that file but its taking too much time to get to output. For small file program works well. Try to optimize the code.

count=0
i=0
j=0
with open('file.txt') as file:
  lines = [i.strip() for i in file]
  for i in range(len(lines)):
      for j in range(i):
          if sorted(lines[i]) == sorted(lines[j]):
              #print(lines[i])
              count=count+1
              j=j+1
              i=i+1
print('There are ',count,'anagram words')

Solution

  • Well it is unclear whether you account for duplicates or not, however if you don't you can remove duplicates from your list of words and that will spare you a huge amount of runtime in my opinion. You can check for anagrams and then use sum() to get the their total number. This should do it:

    def get_unique_words(lines):
        unique = [] 
        for word in " ".join(lines).split(" "): 
            if word not in unique:
                unique.append(word)
        return unique 
    
    def check_for_anagrams(test_word, words):
        return sum([1 for word in words if (sorted(test_word) == sorted(word) and word != test_word)])
    
    with open('file.txt') as file:
      lines = [line.strip() for line in file]
    
    
    unique = get_unique_words(lines)
    count  = sum([check_for_anagrams(word, unique) for word in unique])
    
    print('There are ', count,'unique anagram words aka', int(count/2), 'unique anagram couples')