z-lab released gemma-4-26B-A4B-it-DFlash. Anybody tried it yet?
r/LocalLLaMA
•
Open Source AI
Past few days, its all been about MTPs. Somehow people missed out the fact that Z lab released the Dflash for Gemma4 26B a couple of days ago. As far as my understanding goes, Dflash should be a better alternative than MTP because of faster parallel block diffusion drafting and the fact that it is stateful (it can have a persistent state across iterations for context buffers, KV cache positions, and RoPE offsets). This basically should mean that dflash should be drastically better as the session extends and context grows.