Benchmarking Qwen3.5-35B-3AB on 8 GB VRAM gaming laptop: 26 t/s at 100k context window

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

Hey everyone, I've seen a couple of benchmarks recently and thought this one may be interesting to some of you as well. I'm GPU poor (8 GB VRAM) but still need 'large' context windows from time to time when working with local LLMs to process sensitive data/code/information. The 35B-A3B model of the new generation of Qwen models has proven to be particularly attractive in this regard. Surprisingly, my gaming laptop with 8 GB of VRAM and 64 GB RAM achieves about 26 t/s with 100k context size.