[SOLVED] Anagram from large file

Anagram from large file

I have a file having 10,000 word on it. I wrote a program to find anagram word from that file but its taking too much time to get to output. For small file program works well. Try to optimize the code.

count=0
i=0
j=0
with open('file.txt') as file:
  lines = [i.strip() for i in file]
  for i in range(len(lines)):
      for j in range(i):
          if sorted(lines[i]) == sorted(lines[j]):
              #print(lines[i])
              count=count+1
              j=j+1
              i=i+1
print('There are ',count,'anagram words')

Solution

Well it is unclear whether you account for duplicates or not, however if you don't you can remove duplicates from your list of words and that will spare you a huge amount of runtime in my opinion. You can check for anagrams and then use sum() to get the their total number. This should do it:

def get_unique_words(lines):
    unique = [] 
    for word in " ".join(lines).split(" "): 
        if word not in unique:
            unique.append(word)
    return unique 

def check_for_anagrams(test_word, words):
    return sum([1 for word in words if (sorted(test_word) == sorted(word) and word != test_word)])

with open('file.txt') as file:
  lines = [line.strip() for line in file]


unique = get_unique_words(lines)
count  = sum([check_for_anagrams(word, unique) for word in unique])

print('There are ', count,'unique anagram words aka', int(count/2), 'unique anagram couples')