AI RESEARCH

Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute

arXiv CS.AI

ArXi:2605.15377v1 Announce Type: new As AI systems are increasingly deployed in autonomous agentic settings at scale, it is important to ensure the actions they take are safe and aligned with user intent. Monitoring agent actions is a key safety mechanism, yet reliable monitors remain difficult to build and the scale of these systems makes human oversight impractical. We show that combining signals from diverse monitors into an ensemble improves detection of misaligned actions. We build 12 GPT-4.1-Mini monitors using both prompting and fine-tuning strategies.