DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

ArXi:2604.10425v1 Announce Type: new Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain remains constrained by benchmarks that rely on coarse-grained categories, single-view imagery, and inaccurate metadata. To bridge this gap, we