REAP-pruned Nemotron-3-Super (512 -> 256 experts) + GRPO fine-tune + FP8/AWQ. AIME 2026 90%+. Benchmark inside.
r/LocalLLaMA
•
Machine Learning
AI Hardware
AI Research
Reinforcement Learning
Hey r/LocalLLaMA, Dropping a release I've been working on during AIMO3 (Kaggle competition). Took NVIDIA's Nemotron-3-Super-120B-A12B (latent MoE + Mamba2 hybrid), REAP-pruned from 512->256 experts (removed MTP layer too), LoRA-RL fine-tuned on ~270 AIMO3 + AstralMath problems with GRPO, then quantized to AWQ and FP8 for inference. Result: 120B -> 64B, runs on a single H100/RTX PRO 6000 Blackwell at 90%+ on AIME 2026.