AI RESEARCH

Not all tokens are needed(NAT): token efficient reinforcement learning

arXiv CS.LG

ArXi:2603.06619v1 Announce Type: new Reinforcement learning (RL) has become a key driver of progress in large language models, but scaling RL to long chain-of-thought (CoT) trajectories is increasingly constrained by backpropagation over every generated token. Even with optimized rollout engines, full-token updates can consume a large fraction of total