How Attention Works in Transformers, Interactively

How models decide what matters

Embeddings give every word a meaning, but meaning depends on context — the word “it” means nothing on its own. Self-attention is the mechanism that lets every token in a sentence look at every other token and decide which ones matter for interpreting it. In the classic example on this page, “The animal didn’t cross the street because it was tired,” attention is how the model links “it” back to “animal” rather than “street” — and if you change “tired” to “wide,” the link flips.

Heads, layers, and why this scaled

Real transformers run many attention heads in parallel — one might track subject–verb relationships while another tracks pronoun references — and stack dozens of layers, so later layers build on earlier ones to capture increasingly abstract relationships. This is the core idea of the 2017 “Attention Is All You Need” paper, and it is the single mechanism most responsible for the capabilities of GPT-4, Claude, and every other modern large language model. The interactive demo lets you hover over each word to see its attention links, then goes deeper with the concept, the actual code, and a challenge.

You understand transformers better than most people using them. Join the community

howaiworks.io is free and open source (GitHub), built by Matt Feroz.