KIRA: Knowledge-Intensive Image Retrieval and Reasoning Architecture for Specialized Visual Domains

ArXi:2604.16915v1 Announce Type: new Retrieval augmented generation (RAG) has transformed text based question answering, yet its extension to visual domains remains hindered by fundamental challenges: bridging the modality gap between image queries and text heavy knowledge bases, constructing semantically meaningful visual knowledge bases, performing multihop reasoning over retrieved images, and verifying that generated answers are faithfully grounded in visual evidence.