Execution-Verified Reinforcement Learning for Optimization Modeling

ArXi:2604.00442v1 Announce Type: new Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver