Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding- Google Developers Blog
r/LocalLLaMA
•
Generative AI
Submitted by /u/eternviking [link] [comments]