RTX 5070 Ti 16GB + 32GB RAM: Running Qwen3.6-35B-A3B Q8_0 @ 44 t/s (128K context)

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

32GB DDR5 RAM. unsloth/Qwen3.6-35B-A3B-GGUF Q8_0: 36.9 GB LM studio settings: - GPU Offload: 40 - Offload MoE Experts to CPU: 26 -Try mmap: on -K cache:Q8_0 -V cache:Q8_0 llama.cpp will be better. submitted by /u/moahmo88 [link] [comments]