AI Definition

Transformer

The neural network architecture behind nearly every modern AI model, introduced in 2017.

Transformers process sequences of tokens by attending to relationships between every token the 'self-attention' mechanism. The original 'Attention Is All You Need' paper (Vaswani et al., 2017) showed this approach beat RNNs and LSTMs at translation, and it has since become the architecture of choice for almost everything.

Variants include encoder-only (BERT, used for understanding tasks), decoder-only (GPT and friends, used for generation), and encoder-decoder (T5, FLAN). Most chat LLMs are decoder-only.

Despite huge research effort to find a 'next' architecture (state space models like Mamba, mixture-of-experts, hybrids), transformers remain the dominant design in 2026 largely because everything in the ecosystem is built around them.

Related concepts

Attention Mechanism

The transformer operation that lets each token focus on the other tokens most relevant to it.

LLM (Large Language Model)

A neural network trained on huge amounts of text to predict and generate language.

Want help applying this in production?

Our engineers ship AI features into production every week. Tell us what you're building.

Get a Free Quote Contact Us