Follow-up: Qwen3.6-27B on 1× RTX 3090 — pushing to ~218K context + ~50–66 TPS, tool calls now stable (PN12 fix)
r/LocalLLaMA
•
Generative AI
Following up on our previous post about running Qwen3.6-27B on a single RTX 3090 (~125K context, higher TPS). We’ve been pushing further on both context length and stability for tool-agent workloads. Current results: - ~218K context @ ~50 / 66 TPS (text, narr/code) - ~198K + vision @ ~51 / 68 TPS - tool calls with ~25K-token outputs now complete without OOM So lower TPS than our earlier config, but significantly higher context + stability under real workloads. --- ### What changed Previously, long tool outputs (~25K tokens) would consistently crash.