REAP-pruned Nemotron-3-Super (512 -> 256 experts) + GRPO fine-tune + FP8/AWQ. AIME 2026 90%+. Benchmark inside.

r/LocalLLaMA
Machine Learning AI Hardware AI Research Reinforcement Learning

Hey r/LocalLLaMA, Dropping a release I've been working on during AIMO3 (Kaggle competition). Took NVIDIA's Nemotron-3-Super-120B-A12B (latent MoE + Mamba2 hybrid), REAP-pruned from 512->256 experts (removed MTP layer too), LoRA-RL fine-tuned on ~270 AIMO3 + AstralMath problems with GRPO, then quantized to AWQ and FP8 for inference. Result: 120B -> 64B, runs on a single H100/RTX PRO 6000 Blackwell at 90%+ on AIME 2026.