Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production

ArXi:2510.13879v2 Announce Type: replace Within the landscape of inference-time scaling methods for foundation models, a width-based approach to scaling -- which involves the insertion of tokens in the input stream to delay model responses -- offers a unique advantage by increasing model expressivity while remaining highly parallelizable at both