AI RESEARCH
AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors
arXiv CS.CL
•
ArXi:2602.22755v3 Announce Type: replace