Wow! Qwen 3.6:35b-a3b on a 3090... pretty amazing.

I've been using Anthropic and OpenAI for a year and once I tried ollama - so slow - I totally wrote off local. But I guess things have changed. I picked up a used gaming rig with a 3090 last weekend. Yesterday I set up qwen 3.6:35b-a3b. I got the model that had been squeezed down to 20GB (batiai/qwen3.6-35b:iq4) so it all fit on the 3090. When it was in system ram it was doing a respectable 15tps on output but once I got it all stuffed into VRAM it's output was up to 160tps. Then I fed it a picture. The video processing took 75 seconds but. wow. Just.