96GB Vram. What to run in 2026?

r/LocalLLaMA
Open Source AI

I was all set on doing the 4x 3090 route but with the current releases of qwen 3.5 and gemma 4. I am having second doubts. 96gb of vram seems to be in a weird spot where it not enough to run larger models and than needed for the mid models. What are you running as your main model? submitted by /u/inthesearchof [link] [comments]