The Ends Justify the Thoughts: RL-Induced Motivated Reasoning in LLM CoTs

ArXi:2510.17057v2 Announce Type: replace Chain-of-Thought (CoT) monitoring has emerged as a compelling method for detecting harmful behaviors such as reward hacking for reasoning models, under the assumption that models' reasoning processes are informative of such behaviors. In practice, LLM