AI RESEARCH

Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

arXiv CS.CL

ArXi:2605.13411v1 Announce Type: cross Large language models remain vulnerable to adversarial prompts that elicit harmful outputs. Existing safety paradigms typically couple red-teaming and post-