AI RESEARCH

Persona Non Grata: Single-Method Safety Evaluation Is Incomplete for Persona-Imbued LLMs

arXiv CS.AI

ArXi:2604.11120v1 Announce Type: new Personality imbuing customizes LLM behavior, but safety evaluations almost always study prompt-based personas alone. We show this is incomplete: prompting and activation steering expose *different*, architecture-dependent vulnerability profiles, and testing with only one method can miss a model's dominant failure mode.