GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

ArXi:2604.15715v1 Announce Type: cross The development of general-purpose agents requires a shift from executing simple instructions to completing complex, real-world productivity workflows. However, current tool-use benchmarks remain misaligned with real-world requirements, relying on AI-generated queries, dummy tools, and limited system-level coordination. To address this, we propose GTA-2, a hierarchical benchmark for General Tool Agents (GTA) spanning atomic tool use and open-ended workflows.