Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer
r/LocalLLaMA
•
AI Hardware
AI Tools
The angle here is native Windows, no WSL. Not selling or promoting anything: Numbers (RTX 3090, Windows 10): - 72 tok/s short prompt - 64.5 tok/s long prompt (~25k tokens) - 53.4 tok/s at 127k ctx (single GPU) - 160k ctx on PP=2 (2×3090 GPUs) Honestly, these aren't r/LocalLLaMA records. Community has hit 80-82 tok/s on a 3090 with TurboQuant 3-bit KV, and 160 tok/s on a 5090 on Linux. My launcher and patched vLLM closes that gap on Windows. Simple installation: 1. Download qwen3.6-windows-server-portable-x64.zip from the Release 2. Unzip anywhere. No admin, no pip, no Python required 3.