ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents

ArXi:2605.14153v1 Announce Type: cross Exploitation is not a binary event. It is a ladder of acquiring progressive capabilities, from executing a single buggy line of code to taking full control of the target. However, existing LLM security benchmarks treat a crash as exploitation success. That single binary outcome collapses the hard parts of exploitation: the transition from triggering a bug to constructing reusable primitives and control.