Long-context coding on RTX 5080 16GB: Qwen3.6-35B-A3B holds 30 t/s at 128K (89 t/s fresh), no quality drop

r/LocalLLaMA
Generative AI AI Research

I wanted to see how much of my coding-agent workflow I could move local instead of paying for hosted tools forever. There was another push: Anthropic's own April 23 postmortem confirmed product-layer regressions through March/April. With a local model, what you benchmark is what you get. The other constraint was context. I needed something that stayed usable at 65K-128K minimum. I had an RTX 5080 16GB sitting idle most of the day. Qwen3.6 had been getting enough praise for coding that it seemed worth testing seriously.