AI RESEARCH
DART: Mitigating Harm Drift in Difference-Aware LLMs via Distill-Audit-Repair Training
arXiv CS.CL
•
ArXi:2604.16845v1 Announce Type: new Large language models (LLMs) tuned for safety often avoid acknowledging graphic differences, even when such acknowledgment is factually correct (e.g., ancestry-based disease incidence) or contextually justified (e.g., religious hiring preferences). This identity-blindness yields incorrect responses, unnecessary refusals, or generic "equal-treatment" defaults.