Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

ArXi:2605.14392v1 Announce Type: new We pursue a vision for self-improving language models in which the model does not merely generate problems or traces to imitate, but constructs the environments that train it. In zero-data reasoning RL, this reframes self-improvement from a data-generation loop into an environment-construction loop, where each artifact is a reusable executable object that samples instances, computes references, and scores responses.