Self-ReSET: Learning to Self-Recover from Unsafe Reasoning Trajectories

ArXi:2605.08936v1 Announce Type: new Large Reasoning Models possess remarkable capabilities for self-correction in general domain; however, they frequently struggle to recover from unsafe reasoning trajectories under adversarial attacks. Existing alignment methods attempt to mitigate this vulnerability by fine-tuning the model on expert data including reflection traces or adversarial prefixes. Crucially, these approaches are often hindered by static