Models · intermediate
What is Transformer?
A plain-English explanation of Transformer (Transformer Architecture) — what it means, why it matters, and how it is used in AI.
Transformer
Transformer Architecture
The Transformer is a neural network architecture introduced in the 2017 paper "Attention Is All You Need" that became the foundation for virtually all modern large language models. It uses a mechanism called self-attention to weigh the importance of different words in a sequence relative to each other.
"BERT, GPT-4, and Claude are all built on the Transformer architecture, which is why they can understand context across long passages of text."
Also known as: Transformer architecture, attention model
Why does Transformer matter?
Transformers power language models, image generation models, and multimodal AI systems.
Practice this term
The best way to remember Transformer is to practice unscrambling it. AI Terminology Scrambler uses spaced repetition to help you learn and retain AI vocabulary in just a few minutes a day.
Practice Transformer now →
Related AI terms