AI RESEARCH
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
arXiv CS.AI
•
ArXi:2604.13803v1 Announce Type: cross Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation to how these models represent visual information internally. Whether models whose visual representations closely mirror human neural processing are also resistant to adversarial pressure is an open question with implications for both neuroscience and AI safety.