Best Models for 128gb VRAM: March 2026?

r/LocalLLaMA
Generative AI AI Tools

As the title suggests, what do you think is the best model for 128gb of vram? My use case is agentic coding via cline cli, n8n, summarizing technical documents, and occasional chat via openweb ui. No openclaw. For coding, I need it to be good at C++ and Fortran as I do computational physics. I am rocking qwen3.5 122b via vllm (nvfp4, 256k context at fp8 k cache) on 8 x 5070 ti on an epyc 7532 and 256gb of ddr4. The llm powers another rig that has the same cpu and ram config with a dual v100 32gb for fp64 compute. Both machine runs Ubuntu 24.04.