PonderLM-3: Adaptive Token-Wise Pondering with Differentiable Masking

ArXi:2603.02023v2 Announce Type: replace Test-time scaling has shown that allocating additional computation at inference can improve generation quality, motivating a natural follow-up question: where should this computation be spent? Building on this insight, we