The Amazing Agent Race: Strong Tool Users, Weak Navigators

ArXi:2604.10261v1 Announce Type: new Existing tool-use benchmarks for LLM agents are overwhelmingly linear: our analysis of six benchmarks shows 55 to 100% of instances are simple chains of 2 to 5 steps. We