AI RESEARCH
Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
arXiv CS.AI
•
ArXi:2509.11629v2 Announce Type: replace-cross As large language models (LLMs) continue to advance in capabilities, ensuring their safety against jailbreak attacks remains a critical challenge. In this paper, we