Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

ArXi:2603.14825v1 Announce Type: cross Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety inadvertently degrades performance on general visual-grounded reasoning tasks. In this work, we investigate whether safety and utility are inherently antagonistic objectives. We focus on a modality induced bias direction consistently observed across datasets, which arises from suboptimal coupling between the Large Language Model backbone and visual encoders.