AI RESEARCH
Why Do Vision Language Models Struggle To Recognize Human Emotions?
arXiv CS.AI
•
ArXi:2604.15280v1 Announce Type: cross Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made tremendous progress in the last few years for many visual tasks, potentially offering a promising solution for understanding emotions. However, it is surprising that even the most sophisticated contemporary VLMs struggle to recognize human emotions or to outperform even specialized vision-only classifiers.