AI RESEARCH

From Web to Pixels: Bringing Agentic Search into Visual Perception

arXiv CS.CV

ArXi:2605.12497v1 Announce Type: new Visual perception connects high-level semantic understanding to pixel-level perception, but most existing settings assume that the decisive evidence for identifying a target is already in the image or frozen model knowledge. We study a practical yet harder open-world case where a visible object must first be resolved from external facts, recent events, long-tail entities, or multi-hop relations before it can be localized. We formalize this challenge as Perception Deep Research and.