Why people cares token/s in decoding more?
r/LocalLLaMA
•
Generative AI
What I've noticed while using local LLM recently is that in most cases, bottlenecks occur not in decoding but in prompt processing.