AI RESEARCH
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models
arXiv CS.AI
•
ArXi:2603.28590v1 Announce Type: new Large language models (LLMs) can generate chains of thought (CoTs) that are not always causally responsible for their final outputs. When such a mismatch occurs, the CoT no longer faithfully reflects the decision-critical factors driving the model's behavior, leading to the reduced CoT monitorability problem. However, a comprehensive and fully open-source benchmark for studying CoT monitorability remains lacking. To address this gap, we propose MonitorBench, a systematic benchmark for evaluating CoT monitorability in LLMs.