AI RESEARCH

Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment

arXiv CS.AI

ArXi:2603.11388v1 Announce Type: new Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-