AI RESEARCH
Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction
arXiv CS.AI
•
ArXi:2605.12070v1 Announce Type: cross Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also