Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

ArXi:2605.18583v1 Announce Type: cross Coding agents now run autonomously with shell, file, and network privileges. When a user issues a benign request, the agent sometimes does than asked: it deletes unrelated files, wipes a stale credentials backup, or rewrites configuration the user never mentioned. We call these scope expansions overeager actions, an authorization problem distinct from capability failures, prompt injection, or sandbox escapes. We present OverEager-Gen, a benchmark dedicated to overeager behavior on benign tasks.