The Big LLM Architecture Comparison

https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison

  • positional encoding: absolute, RoPE

  • attention: multihead attention, Grouped-Query Attention

  • activation: GELU, SwiGLU