AI RESEARCH

Large Reasoning Models Learn Better Alignment from Flawed Thinking

arXiv CS.LG

ArXi:2510.00938v2 Announce Type: replace Large reasoning models (LRMs) "think" by generating structured chain-of-thought (CoT) before producing a final answer, yet they still lack the ability to reason critically about safety alignment and are easily biased when a flawed premise is injected into their thought process. We propose RECAP (Robust Safety Alignment via Counter-Aligned Prefilling), a principled reinforcement learning (RL) method for post-