Understanding Decoder-Only Transformers Part 2: Decoder-Only vs Regular Transformers
Dev.to AI
•
NLP
AI Research
In this article, we will look at the differences between a decoder-only transformer and a standard (encoder-decoder) transformer. How Decoder-Only Transformers Work A decoder-only transformer uses the same components to process the input prompt and to generate the output. It relies on masked self-attention, which considers only the current word and the words that came before it. Masked self-attention is applied to both: the input prompt the generated output This means the entire process is handled by a single stack of decoder layers.