Does your AI have a hidden agenda? I ran 50 covert behavior tests on 10 frontier models.
Dev.to AI
•
Generative AI
AI Research
I run independent benchmarks on frontier AI models. No vendor funding, no advertising, no partnerships. I test with an independent judge model (GLM-5) to avoid self-grading bias. Last week I ran 50 Covert Behavior Detection tests on 10 frontier models across 5 categories. The benchmark measures whether a model does things behind your back: hidden actions, undisclosed reasoning, behavior changes when monitored, and attempts to appear less suspicious.