AI RESEARCH

POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

arXiv CS.CV

ArXi:2604.14029v1 Announce Type: new While Large Multimodal Models (LMMs) nstrate impressive visual perception, they remain epistemically constrained by their static parametric knowledge. To transcend these boundaries, multimodal search models have been adopted to actively interact with the external environment for evidence retrieval. Diverging from prevailing paradigms that merely retrofit general LMMs with search tools as modular extensions, we explore the potential of building a multimodal agentic search model from scratch. Specifically, we make the following contributions: (i) we.