pytorchhuggingface-transformers

What does "permutation invariant" mean in the context of transformers doing language modelling?


I am reading a research paper called LiLT, their they mentioned transformers are permutation invariant. what is the meaning of permutation invariant in case of language modelling? link to paper

enter image description here


Solution

  • Since all tokens in the sequence are treated equally in transformers, changing the order of the input tokens (=permutation) would result in the same output (=invariance). To avoid this, one adds positional embeddings, which are just numbers in each token, that represent its position in the sequence.

    Eg. in language modelling "I traveled from France to England and saw " should result in something like "London" as the next word. But without the positional embeddings, the transformer cant differentiate between the correct sentence and "I traveled from England to France and saw ". So it might as well respond with "Paris". The order of words matters. Thereby, permutation invariance is bad in language modelling.