AI RESEARCH

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

arXiv CS.AI

ArXi:2604.04978v1 Announce Type: cross Claude Code's auto mode is the first deployed permission system for AI coding agents, using a two-stage transcript classifier to gate dangerous tool calls. Anthropic reports a 0.4% false positive rate and 17% false negative rate on production traffic. We present the first independent evaluation of this system on deliberately ambiguous authorization scenarios, i.e., tasks where the user's intent is clear but the target scope, blast radius, or risk level is underspecified.