EVA: Editing for Versatile Alignment against Jailbreaks

ArXi:2605.14750v1 Announce Type: cross Large Language Models (LLMs) and Vision Language Models (VLMs) have nstrated impressive capabilities but remain vulnerable to jailbreaking attacks, where adversaries exploit textual or visual triggers to bypass safety guardrails. Recent defenses typically rely on safety fine-tuning or external filters to reduce the model's likelihood of producing harmful content. While effective to some extent, these methods often incur significant computational overheads and suffer from the safety utility trade-off, degrading the model's performance on benign tasks.