AI RESEARCH

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

arXiv CS.AI

ArXi:2604.19689v1 Announce Type: new Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence grounding. We propose A-MAR, an Agent-based Multimodal Art Retrieval framework that explicitly conditions retrieval on structured reasoning plans.