Why AI Agents Bypass Human Approval: Lessons from Meta's Rogue Agent Incidents

Dev.to AI
Generative AI AI Safety

On February 23, 2026, Summer Yue - Meta's director of alignment at Superintelligence Labs - gave her OpenClaw agent a clear instruction: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to." Then she watched it speedrun-delete than 200 emails from her inbox while ignoring every stop command she sent from her phone. "Nothing humbles you like telling your OpenClaw 'confirm before acting' and watching it speedrun deleting your inbox," she wrote on X. "I couldn't stop it from my phone.