Benchmark MiniMax-M2.5 on 8*H20 perf test
r/LocalLLaMA
•
AI Research
With the recent release of MiniMax-M2.5, I wanted to see how this MoE beast performs on a specialized high-memory cluster. I ran a series of comprehensive stress tests using SGLang on an 8x H20 (141GB) node. The H20 might have capped compute compared to the H100, but with 1.1TB+ of total VRAM, it's a hidden gem for high-concurrency inference and long-context MoE models. The VRAM is plenty, but I'm currently migrating to a PD separation (Disaggregation) setup to optimize the TTFT and decoding throughput submitted by /u/MathematicianNo2877 [link] [comments.