AgenticEval: Toward Agentic and Self-Evolving Safety Evaluation of Large Language Models

ArXi:2509.26100v2 Announce Type: replace The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliable safety and compliance evaluation. However, existing static benchmarks are ill-equipped to address the dynamic nature of AI risks and evolving regulations, creating a critical safety gap. This paper