Your LLM Is Guessing Ahead. Then It Checks Itself aka Speculative Decoding

Towards AI
Generative AI

Every token your LLM generates costs one full forward pass. One pass, one token. No shortcuts.