I wrote a PowerShell script to sweep llama.cpp MoE nCpuMoe vs batch settings
r/LocalLLaMA
•
Machine Learning
Generative AI
Open Source AI
AI Research
Hi all, I have been playing around with Qwen 3.5 MOE models and found the sweetspot tradeoff between nCpuMoe and the batchsize for speed isn't linear. I also kept rerunning the same tests across different quants, which got tedious. If there is a tool/script that does this already, and I missed also let me know (I didn't find any). How it works: Start at your chosen lowest NCpuMoe and batch size benchmark that as the baseline Proceed to (using binary search) increase the batch size and run benchmarks keep track of the best run (based on your selected metric, i.e.