I have created a word-level text generator using an LSTM model. But in my case, not every word is suitable to be selected. I want them to match additional conditions:
10100010
). Then, the sentence generated needs to meet a given structure, for instance, 01001100
(hi 01
and friend 001100
). Thus, to handle this scenario, I've created a pandas dataframe with the following structure:
word last_vowel word_map
----- --------- ----------
hello o 01001
stack a 00100
jhon o 0010
This is my current workflow:
0100100100100
, we can choose the word hello, as its vowel structure is 01001
.0100100100100
will become 00100100
as we've removed the initial 01001
(hello).00100
and jhon 0010
.That works like a charm, but there is one remaining condition to handle: the last vowel of the sentence.
My way to deal with this issue is the following:
Do you think is there a better approach? Maybe a GAN or reinforcement learning?
EDIT: I think another approach would be adding WFST. I've heard about pynini library, but I don't know how to apply it to my specific context.
If you are happy with your approach, the easiest way might be if you'd be able to train your LSTM on the reversed sequences as to train it to give the weight of the previous word, rather than the next one. In such a case, you can use the method you already employ, except that the first subset of words would be satisfying the last vowel constraint. I don't believe that this is guaranteed to produce the best result.
Now, if that reversal is not possible or if, after reading my answer further, you find that this doesn't find the best solution, then I suggest using a pathfinding algorithm, similar to reinforcement learning, but not statistical as the weights computed by the trained LSTM are deterministic. What you currently use is essentially a depth first greedy search which, depending on the LSTM output, might be even optimal. Say if LSTM is giving you a guaranteed monotonous increase in the sum which doesn't vary much between the acceptable consequent words (as the difference between N-1 and N sequence is much larger than the difference between the different options of the Nth word). In the general case, when there is no clear heuristic to help you, you will have to perform an exhaustive search. If you can come up with an admissible heuristic, you can use A* instead of Dijkstra's algorithm in the first option below, and it will do the faster, the better you heuristic is.
I suppose it is clear, but just in case, your graph connectivity is defined by your constraint sequence. The initial node (0-length sequence with no words) is connected with any word in your data frame that matches the beginning of your constraint sequence. So you do not have the graph as a data structure, just it's the compressed description as this constraint.
EDIT As per request in the comment here are additional details. Here are a couple of options though:
Apply Dijkstra's algorithm multiple times. Dijkstra's search finds the shortest path between 2 known nodes, while in your case we only have the initial node (0-length sequence with no words) and the final words are unknown.
Modify your existing depth-first search to do an exhaustive search.