Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards

ArXi:2603.24709v1 Announce Type: new Multi-step tool orchestration, where LLMs must invoke multiple dependent APIs in the correct order while propagating intermediate outputs, remains challenging. State-of-the-art models frequently fail on full sequence execution, with parameter value errors accounting for a significant portion of failures