Considering two Sparks for local coding
r/LocalLLaMA
•
Generative AI
I'm currently running a 4x RTX 3090 system (96GB VRAM, DDR4 2133 RAM) and have tested opencode and pi.de using Qwen3.5-122B-A10B (AWQ) up to 200k context for web app coding (html/js/python). I'm now seriously considering picking up two Sparks paired with MiniMax M2.7 for local inference. Two units are needed to keep prompt processing at acceptable speeds. Output tokens/sec stays the same regardless (~15 tok/s at ~100k context, based on what I've seen here). Combined 2 * 128 GB = 256 GB VRAM leaves headroom for future models (next MiniMax version, Qwen3.6-122B.