ZINA: Multimodal Fine-grained Hallucination Detection and Editing

ArXi:2506.13130v2 Announce Type: replace-cross Multimodal Large Language Models (MLLMs) often generate hallucinations, where the output deviates from the visual content. Given that these hallucinations can take diverse forms, detecting hallucinations at a fine-grained level is essential for comprehensive evaluation and analysis. To this end, we propose a novel task of multimodal fine-grained hallucination detection and editing for MLLMs.