Qwen 3.6 35 UD 2 K_XL is pulling beyond its weight and quantization (No one is GPU Poor now)

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

Hi guys, Back again. I have tested the Qwen 3.6 UD 2 K_XL Unsloth model on the same paper to web app task. The model is performing very well. It handled all tool calls properly and also managed large context using llama.cpp on a 16GB VRAM on laptop. I have attached all details total tool calls were 58, with a success rate of 98.3%. The model also processed around 2.7M tokens while building the app from the given paper.