12GB-Club: 4070S qwen3.6 27b + 35b a3b, and Gemma 4 26b a4b + 31b speeds
r/LocalLLaMA
•
AI Hardware
Open Source AI
Longtime lurker here, thought i should post my speeeeds. I have a RTX 4070S 12 GB Vram (+10% OC), AMD 9800x3D with 4x16 Gb DDR5 6000Mhz CL30. EDIT: I offload my display to my igpu btw to save some vram on the rtx dgpu. Otherwise drop 10% or so on performance. EDIT2: Using this with cuda 13.1 Please dont ask me how good they can do stuff, it's all working with no tool calls issues in VS Code with Cline and KiloCode and can use subagents too. I have not looked in to pi-coding yet.