Beyond Inference-Time Search: Reinforcement Learning Synthesizes Reusable Solvers

ArXi:2605.18374v1 Announce Type: new Large language models (LLMs) typically approach combinatorial optimization as an inference-time procedure, solving each instance separately through sampling, search, or repeated prompting. We ask whether reinforcement learning can instead shift part of this reasoning cost into the weights of a code LLM, so that the model synthesizes a reusable solver for an entire problem family.