Why people cares token/s in decoding more?

r/LocalLLaMA • May 06, 2026

Generative AI

What I've noticed while using local LLM recently is that in most cases, bottlenecks occur not in decoding but in prompt processing.