The Big LLM Architecture Comparison
https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
positional encoding: absolute, RoPE
attention: multihead attention, Grouped-Query Attention
activation: GELU, SwiGLU
https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
positional encoding: absolute, RoPE
attention: multihead attention, Grouped-Query Attention
activation: GELU, SwiGLU