Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces

ArXi:2603.06713v1 Announce Type: new Agentic systems operating over large tool ecosystems must plan and execute long-horizon workflows under weak or non-verifiable supervision. While frontier models mitigate these challenges through scale and large context budgets, small language models (SLMs) remain brittle: eager tool loading saturates context, execution errors compound over time, and sparse rewards limit learning. We