AI RESEARCH
Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment
arXiv CS.AI
•
ArXi:2603.11388v1 Announce Type: new Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-