Can a 5090 with qwen3.6 achieve > 3,000 tok/s ? bring your pitchforks (open-dllm)

So background - these people. Fred Zhangzhi Peng, Shuibai Zhang, Alex Tong, worked on converting AR -> diffusion (its already working from older models). I forked the codebase - ran it through opencode with free deepseek-flash / GLM5.1 overnight to upgrade to qwen3.6 - beca is > 6 mths old - i got AI to mash up LDLM a most recent paper in the mix Viachesla Meshchaninov1, Alexander Shabalin1, Egor Chimbulatov2, Nikita Gushchin3,4, Ilya Koziev5, Alexander Korotin3,4, Dmitry Vetrov1 - these guys spent 3 years working on getting this paper working.