I have seen many people working on Neural Machine Translation. Usually, they represent their sentence between <BOS><EOS>
, <START><END>
, etc. tags, before training the network. Of course it's a logical solution to specify the start and end of sentences, but I wonder how the neural networks understand that the string <END>
(or others) means end of a sentence?
It doesn't.
At inference time, there's a hardcoded rule that if that token is generated, the sequence is done, and the underlying neural model will no longer be asked for the next token.
source_seq = tokenize('This is not a test.')
print(source_seq)
At this point you'd get something like:
[ '<BOS>', 'Thi###', ... , '###t', '.' , '<EOS>' ]
Now we build the target sequence with the same format:
target_seq = [ '<BOS>' ]
while true:
token = model.generate_next_token(source_seq, target_seq)
if token == '<EOS>':
break
seq.append(token)
The model itself only predicts the most likely next token give the current state (the input sequence and the output sequence so far).
It can't exit the loop any more than it can pull your machine's plug out of the wall.
Note that that's not the only hardcoded ruled here. The other one is the decision to start from the first token and only ever append - never prepend, never delete... - like a human speaking.