Constitutional Black-Box Monitoring for Scheming in LLM Agents (3 minute read)

TLDR AI
Generative AI

Black-box monitors can detect AI scheming using only observable actions, without internal insights.