AI RESEARCH

SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

arXiv CS.LG

ArXi:2604.16995v1 Announce Type: cross Reinforcement learning (RL) has emerged as a promising paradigm for