AI RESEARCH

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

arXiv CS.LG

ArXi:2604.13313v1 Announce Type: new Vision-Language Models nstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word order and attribute binding. This limitation arises from a scarcity of informative samples needed to differentiate subtle semantic variations during contrastive pre