Internalizing Safety Understanding in Large Reasoning Models via Verification

ArXi:2605.08930v1 Announce Type: new While explicit Chain-of-Thought (CoT) empowers large reasoning models (LRMs), it enables the generation of riskier final answers. Current alignment paradigms primarily rely on externally enforced compliance, optimizing models to detect malicious prompts rather than evaluating the safety of their own outputs.