AI RESEARCH

Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback

arXiv CS.LG

ArXi:2603.16928v1 Announce Type: cross Chain-of-thought (CoT) monitoring is proposed as a method for overseeing the internal reasoning of language-model agents. Prior work has shown that when models are explicitly informed that their reasoning is being monitored, or are fine-tuned to internalize this fact, they may learn to obfuscate their CoTs in ways that allow them to evade CoT-based monitoring systems. We ask whether reasoning agents can autonomously infer that their supposedly private CoT is under surveillance, and whether this awareness leads to strategic evasion, without any explicit.