I just started using GPT2 and I have a question concerning special tokens:
I'd like to predict the next word given a text input, but I want to mask some words in my input chunk using a special token. I don't want GPT2 to predict the masked words, I just don't want to use them for the prediction and I want GPT2 to "know" that it doesn't "see" all the input words.
Here's an example: I have "the quick brown fox jumps over the lazy" as an input sentence. I want GPT2 to predict the last word (correct would be "dog" in this case). I also want to mask the words "the lazy", but GPT2 should "know" there is something at the end of the input sentence. So basically for GPT2, the input should look like this: "the quick brown fox jumps over _ _", and not like this: "the quick brown fox jumps over", so it knows not to predict the word after "over".
I thought about using special tokens to replace the "hidden" words, but I think neither MASK nor PAD make sense in this case.
Does anyone have an idea how to solve this?
Thanks in advance for your help!
Solved this, masking the tokens did the trick. I used an attention mask and set all attention mask values of tokens I wanted to ignore to 0, so their attention weights are 0 on all layers.