I have wordlist.txt
that separated by new line.
If I specify number of quotas to use for each alphabet for example
n: 1
e: 1
w: 1
b: 1
o: 2
k: 1
Remain alphabets quota is 0.
How to construct a sentence from given quotas of alphabet that must be spent all until zero and based on words that defined in wordlist.txt?
For example, from given alphabet quotas, it will returning "new book" or "book new". Word order doesn't matter.
Where "new" and "book" has been exists in wordlist.txt
.
So list of possible sentences might be like this:
new book
book new
bow neko
neko bow
Suppose there are few new words in wordlist.txt
to handle multiple anagram:
bow
book
new
neko
ujang
wen
koob
if book
and koob
sorted, both will have same value, it's bkoo
. The same value of words is considered as anagram_id
.
Instead of using defined quota, I can just write the string directly that represent quota of alphabet, because when sorted, it will be same.
from itertools import combinations, product
from collections import OrderedDict
def generate_anagrams(input_sentence='k o bo ew n', wordlist_filepath='filtered_wordlist.txt'):
input_sentence = filterOrigin(input_sentence)
with open(wordlist_filepath, 'r') as file:
wordlist = file.read().splitlines()
anagram_id = []
for word in wordlist:
anagram_id.append(''.join(sorted(word))) # sorted word is anagram id
sorted_input_sentence = ''.join(sorted(filterOrigin(input_sentence)))
all_anagram_id = []
for i in range(1, len(input_sentence)+1):
combs = combinations(input_sentence, i)
all_anagram_id += [''.join(sorted(comb)) for comb in combs]
all_registered_anagram_id = []
for id_from_input in all_anagram_id:
for id_from_wordlist in anagram_id:
if id_from_input == id_from_wordlist:
all_registered_anagram_id.append(id_from_wordlist)
all_registered_anagram_values = dict()
for id in all_registered_anagram_id:
all_registered_anagram_values[id] = ([wordlist[i] for i, x in enumerate(anagram_id) if x == id])
sentence_combs = []
for l in range(1, len(all_registered_anagram_id)+1):
sentence_combs.append(set(combinations(all_registered_anagram_id, l)))
valid_sentences_id = []
for comb in sentence_combs:
for pair in comb:
candidate = ''.join(pair)
if sorted_input_sentence == sorted(candidate): # is anagram?
valid_sentences_id.append(pair)
valid_sentences = []
for sid in valid_sentences_id:
broadcasted = []
for id in sid:
broadcasted.append(all_registered_anagram_values[id])
for sentence in list(product(*broadcasted)):
valid_sentences.append(' '.join(sentence))
return valid_sentences
generate_anagrams()
Returning output:
['new book', 'new koob', 'wen book', 'wen koob', 'bow neko']