Attention Is All You Need

Revolutionising machine learning through Transformer models.

10 mins
What is it about?




This paper introduces the Transformer, a unique machine learning architecture solely based on attention mechanisms, eliminating the need for recurrent or convolutional neural networks.

Consider you’re playing with blocks, each showcasing a different animal. You aim to construct a tower with land and water animals side-by-side, but can only hold a few blocks simultaneously.

You could handle one block at a time, deciding its position in the tower – a process resembling the Recurrent Neural Network (RNN). RNNs review single data pieces, remembering past information for decision-making.

Alternatively, you might evaluate several blocks together, like the Convolutional Neural Network (CNN). CNNs assess information chunks to understand the bigger picture.
However, the most efficient strategy involves laying out all blocks and observing them at once, akin to the Transformer. Using “attention”, it comprehends all information simultaneously, understanding the importance and relationship of each piece regardless of their position.