STRIDE: Learnable Stepwise Language Feedback for LLM Reasoning

ArXi:2605.18851v1 Announce Type: new Recent advances in Reinforcement Learning (RL) have underscored its potential for incentivizing reasoning capabilities of Large Language Models (LLMs). However, existing step-level efforts suffer from costly annotations that limit domain coverage, while scalar scores further impose an information bottleneck, offering insufficient semantic bandwidth to improve intermediate decisions.