AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

ArXi:2605.15565v1 Announce Type: cross Reinforcement learning (RL) is increasingly used to improve the reasoning, coding, and tool-use capabilities of large language models, but agentic RL remains prohibitively expensive. Scaling RL to agentic LLMs requires ing complex workloads, including multi-policy collaborative