CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models

ArXi:2602.17684v2 Announce Type: replace Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and reliability of high-quality test cases. We propose CodeScaler, a reward model designed to scale both reinforcement learning