Qwen 3.5 35b on 8GB Vram for local agentic workflow

Recently I had been using Antigravity for mostly vibe coding stuff that i needed. But the limits have hit hard. (have google ai pro yearly plan) So I pivoted to local LLMs to augment it. After extensive testing of different models I have settled on Qwen 3.5 35B A3B Heretic Opus (Q4_K_M GGUF). My specs are: (Lenovo Legion) CPU: i9-14900HX (8 P-Cores, E-cores disabled in BIOS, 32GB DDR5 RAM) GPU: RTX 4060m (8GB VRAM) Currently I am getting about 700t/s for prompt processing and 42t/s for token generation which is respectable for my 8gb vram gpu.