NetArena: Dynamic Benchmarks for AI Agents in Network Automation

ArXi:2506.03231v2 Announce Type: replace-cross As AI agents expand into high-stakes domains like network system operations, evaluating their real-world reliability becomes increasingly critical. However, existing benchmarks risk contamination due to static design, show high statistical variance from limited dataset size, and fail to reflect the complexity of production environments. We present NetArena, a dynamic benchmark generation framework for network applications. NetArena