Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent

ArXi:2603.05578v1 Announce Type: cross Research on self-evolving language agents has accelerated, drawing increasing attention to their ability to create, adapt, and maintain tools from task requirements. However, existing benchmarks predominantly rely on predefined specifications, which limits scalability and hinders truly autonomous evolution. While recent studies attempt to dynamically generate tools, they primarily emphasize downstream performance, resulting in a "black-box" evaluation that makes it difficult to attribute failures to specific causes.