ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models

ArXi:2509.15435v2 Announce Type: replace-cross Large Vision-Language Models (LVLMs) exhibit strong multimodal capabilities but remain vulnerable to hallucinations from intrinsic errors and adversarial attacks from external exploitations, limiting their reliability in real-world applications. We present ORCA, an agentic reasoning framework that improves the factual accuracy and adversarial robustness of pretrained LVLMs through inference-time structured inference reasoning with a suite of small vision models (less than 3B parameters.