AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

ArXi:2604.02947v1 Announce Type: new Computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments. Unlike chat systems, they maintain state across interactions and translate intermediate outputs into concrete actions. This creates a distinct safety challenge in that harmful behavior may emerge through sequences of individually plausible steps, including intermediate actions that appear locally acceptable but collectively lead to unauthorized actions.