AI RESEARCH
Large Reasoning Models Learn Better Alignment from Flawed Thinking
arXiv CS.LG
•
ArXi:2510.00938v2 Announce Type: replace Large reasoning models (LRMs) "think" by generating structured chain-of-thought (CoT) before producing a final answer, yet they still lack the ability to reason critically about safety alignment and are easily biased when a flawed premise is injected into their thought process. We propose RECAP (Robust Safety Alignment via Counter-Aligned Prefilling), a principled reinforcement learning (RL) method for post-