AI RESEARCH
Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation
arXiv CS.CL
•
ArXi:2604.17020v1 Announce Type: new Static benchmarks for harmful content detection face limitations in scalability and diversity, and may also be affected by contamination from web-scale pre-