Today my teacher ask me a question: he said the CNN is use the translation invariance of the images or matrixs. So what is the properties Transformer uses ???
There are two main properties of transformers that makes them so appealing compared to convolutions:
See sec. 3 and fig. 3 in:
Shir Amir, Yossi Gandelsman, Shai Bagon and Tali Dekel Deep ViT Features as Dense Visual Descriptors (arXiv 2021).