Pushing the limit: minimax m2.7 q8_0 128k on 2x3090, 256GB DDR4

r/LocalLLaMA
AI Research

CPU is just a secondhand 10900x. Using 128k context, unquantized k cache. Model is at q8_0 to mitigate some weird behavior I was seeing at lower quants. Speed is very slow at around 50tps pp, 10tps tg, but usable for coding agent workflows. Anybody else running MoE models in this size class on relatively low-end hardware? For my purposes, speed is less important than accuracy, as long as it's not like literally all day.