pythontextanagram

Construct list of possible sentences from given alphabet quota and wordlist.txt


I have wordlist.txt that separated by new line.

If I specify number of quotas to use for each alphabet for example

n: 1
e: 1
w: 1
b: 1
o: 2
k: 1
Remain alphabets quota is 0.

How to construct a sentence from given quotas of alphabet that must be spent all until zero and based on words that defined in wordlist.txt?

For example, from given alphabet quotas, it will returning "new book" or "book new". Word order doesn't matter.

Where "new" and "book" has been exists in wordlist.txt.

So list of possible sentences might be like this:

new book
book new
bow neko
neko bow

Solution

  • Suppose there are few new words in wordlist.txt to handle multiple anagram:

    bow
    book
    new
    neko
    ujang
    wen
    koob
    

    if book and koob sorted, both will have same value, it's bkoo. The same value of words is considered as anagram_id.

    Instead of using defined quota, I can just write the string directly that represent quota of alphabet, because when sorted, it will be same.

    from itertools import combinations, product
    from collections import OrderedDict
    
    def generate_anagrams(input_sentence='k o bo ew n', wordlist_filepath='filtered_wordlist.txt'):
      input_sentence = filterOrigin(input_sentence)
    
      with open(wordlist_filepath, 'r') as file:
        wordlist = file.read().splitlines()
    
      anagram_id = []
      for word in wordlist:
        anagram_id.append(''.join(sorted(word))) # sorted word is anagram id
    
      sorted_input_sentence = ''.join(sorted(filterOrigin(input_sentence)))
    
      all_anagram_id = []
      for i in range(1, len(input_sentence)+1):
        combs = combinations(input_sentence, i)
        all_anagram_id += [''.join(sorted(comb)) for comb in combs]
    
      all_registered_anagram_id = []
      for id_from_input in all_anagram_id:
        for id_from_wordlist in anagram_id:
          if id_from_input == id_from_wordlist:
            all_registered_anagram_id.append(id_from_wordlist)
    
      all_registered_anagram_values = dict()
      for id in all_registered_anagram_id:
        all_registered_anagram_values[id] = ([wordlist[i] for i, x in enumerate(anagram_id) if x == id])
    
      sentence_combs = []
      for l in range(1, len(all_registered_anagram_id)+1):
        sentence_combs.append(set(combinations(all_registered_anagram_id, l)))
    
      valid_sentences_id = []
      for comb in sentence_combs:
        for pair in comb:
          candidate = ''.join(pair)
          if sorted_input_sentence == sorted(candidate): # is anagram?
            valid_sentences_id.append(pair)
      
      valid_sentences = []
      for sid in valid_sentences_id:
        broadcasted = []
        for id in sid:
          broadcasted.append(all_registered_anagram_values[id])
        for sentence in list(product(*broadcasted)):
          valid_sentences.append(' '.join(sentence))
      
      return valid_sentences
    
    generate_anagrams()
    

    Returning output:

    ['new book', 'new koob', 'wen book', 'wen koob', 'bow neko']