Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning

ArXi:2604.02322v1 Announce Type: new Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require complex