Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration

ArXi:2602.11937v2 Announce Type: replace Reasoning-focused LLMs improve answer quality by generating longer reasoning traces, but the additional tokens dramatically increase serving cost, motivating inference optimization. We extend and apply Puzzle, a post-