Nemotron-Labs-Diffusion from NVIDIA
r/LocalLLaMA
•
Generative AI
AI Hardware
AI Research
Model Overview Nemotron-Labs-Diffusion is a tri-mode language model that s both AR decoding and diffusion-based parallel decoding by simply switching the attention pattern of the same model during inference. The synergy between these two modes enables a third mode, called self-speculation: the same model performs diffusion-based parallel drafting and AR verification with shared KV cache, achieving high acceptance lengths and decoding efficiency.