Papers
2017 Attention is all you need
https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
transformer
2018 Improving Language Understanding by Generative Pre-Training
GPT-1
2019.06 Language models are unsupervised multitask learners
https://storage.prod.researchhub.com/uploads/papers/2020/06/01/language-models.pdf
GPT-2, 1.5B parameters
2020 Language models are few-shot learners
https://proceedings.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
GPT-3
2023.02 LLaMA: Open and Efficient Foundation Language Models
2023.12 GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
2025.5 Qwen3 Technical Report
2023.11 A Survey of Large Language Models
2025.5 Large language models: A survey