Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE

ArXi:2602.02443v2 Announce Type: replace Test-time scaling improves LLM performance by generating multiple candidate solutions, yet token-level sampling requires temperature tuning that trades off diversity against stability. Fine-grained MoE, featuring hundreds of well-trained experts per layer and multi-expert activation per token, offers an unexplored alternative through its rich routing space.