Scaling Test-Time Compute for Agentic Coding

ArXi:2604.16529v1 Announce Type: cross Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long-horizon coding agents violate this premise: each attempt produces an extended trajectory of actions, observations, errors, and partial progress taken by the agent. In this setting, the main challenge is no longer generating attempts, but representing prior experience in a form that can be effectively selected from and reused.