OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL

ArXi:2602.10687v3 Announce Type: replace-cross Existing forgery detection methods are often limited to uni-modal or bi-modal settings, failing to handle the interleaved text, images, and videos prevalent in real-world misinformation. To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.