What's your tps on 3090 + Qwen 3.6 27B in real tasks?
r/LocalLLaMA
•
Generative AI
Open Source AI
I struggle to wrap my head around all this. My goal is local agent to solve low complexity tasks, in the same harness where I would use frontier models. So naturally this means a large context window, because low complexity can mean a simple-ish fix in a large codebase, rather than just generating some nonsense from zero. So initially I went for Tom's turboquant plus fork of llama.cpp (I'm on Windows) with Qwen 3.6 Q4 and IQ4 models and 200k context window. Well it worked, it can read the entirety of example project I gave to it and make an audit (as much as it's capable of making it.