AI RESEARCH

Training on Documents About Monitoring Leads to CoT Obfuscation

arXiv CS.LG

ArXi:2605.15257v1 Announce Type: new Chain-of-thought (CoT) monitoring is one of the most promising tools we have for detecting model misbehavior, but its effectiveness depends on models faithfully externalizing their reasoning. Motivated by this vulnerability, we study whether monitor-aware models are capable of obfuscating their reasoning to evade detection. We use synthetic document finetuning to expose eight models to realistic pre-