Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents

ArXi:2601.21699v2 Announce Type: replace Multi-turn reasoning agents solve complex questions by decomposing them into intermediate retrieval or tool-use steps, for accumulating ing evidence across turns. Meanwhile, with reinforcement learning (RL)