AI RESEARCH
EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning
arXiv CS.LG
•
ArXi:2508.07809v5 Announce Type: replace Reinforcement learning with verifiable reward (RLVR) has become a promising paradigm for post-