AI RESEARCH
Omni-NegCLIP: Enhancing CLIP with Front-Layer Contrastive Fine-Tuning for Comprehensive Negation Understanding
arXiv CS.AI
•
ArXi:2603.29258v1 Announce Type: cross Vision-Language Models (VLMs) have nstrated strong capabilities across a wide range of multimodal tasks. However, recent studies have shown that VLMs, such as CLIP, perform poorly in understanding negation expressions, which are common in natural language.