Transformers

2017-Attention is all you need

See [VSP+17], the first paper about transformer.

2020-An image is worth 16x16 words: Transformers for image recognition at scale

See [DBK+20], known as Vit.

Transformers

DBK+20

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, and others. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. URL: https://arxiv.org/pdf/2010.11929.

VSP+17

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 2017. URL: https://arxiv.org/pdf/2010.11929.