Hodoscope: Unsupervised Monitoring for AI Misbehaviors

ArXi:2604.11072v1 Announce Type: new Existing approaches to monitoring AI agents rely on supervised evaluation: human-written rules or LLM-based judges that check for known failure modes. However, novel misbehaviors may fall outside predefined categories entirely and LLM-based judges can be unreliable. To address this, we formulate unsupervised monitoring, drawing an analogy to unsupervised learning.