I finally found the best 5070 TI + 32GB ram GGUF model

r/LocalLLaMA
Generative AI Open Source AI

It's the Gemma 4 26B A3B IQ4 NL. My llama.cpp command is: llama-server.exe -m "gemma-4-26B-A4B-it-UD-IQ4_NL.gguf" -ngl 999 -fa on -c 65536 -ctk q8_0 -ct q8_0 --batch-size 1024 --ubatch-size 512 --temp 1.0 --top-k 64 --top-p 0.95 --min-p 0.0 --repeat-penalty 1.0 --no-warmup --port 8080 --host 0.0.0.0 --chat-template-kwargs "{\"enable_thinking\":true}" --perf In essence, this is just the recommended setting's from Google, but this has served me damn well as a co-assistant to Claude Code in VS Code. I gave it tests, and it's around 6.5/10.