Quick thoughts on Qwen3.5-35B-A3B-UD-IQ4_XS from Unsloth

r/LocalLLaMA
Generative AI

Just some quick thoughts on Qwen3.5-35B-A3B-UD-IQ4_XS after I finally got it working in the new version of Ooba. In short: on a 3090, this thing runs at around 100 t/s with almost no preprocessing time, and it can fit like a 250k context length on the card with no cache quantization. Actual performance is quite good. I always make a quick and chuck it on Codepen, and I've been trying and failing to make a basic 3D snake game in ThreeJS with a local model until now.