LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models

ArXi:2604.18803v1 Announce Type: cross Vision-Language Models (VLMs) are increasingly deployed in settings where reliable visual grounding carries operational consequences, yet their behavior under progressively coercive prompt phrasing remains undercharacterized. Existing hallucination benchmarks predominantly rely on neutral prompts and binary detection, leaving open how both the incidence and the intensity of fabrication respond to graded linguistic pressure across structurally distinct task types.