CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale

ArXi:2506.02548v3 Announce Type: replace-cross AI agents have significant potential to reshape cybersecurity, making a thorough assessment of their capabilities critical. However, existing evaluations fall short, because they are based on small-scale benchmarks and only measure static outcomes, failing to capture the full, dynamic range of real-world security challenges. To address these limitations, we