SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues

ArXi:2603.19092v1 Announce Type: cross Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However, it remains unclear which visual evidence drives these judgments. We study whether multimodal safety behavior in VLMs can be steered by simple semantic cues. We