AI RESEARCH
ProCap: Projection-Aware Captioning for Spatial Augmented Reality
arXiv CS.CV
•
ArXi:2604.00912v1 Announce Type: new Spatial augmented reality (SAR) directly projects digital content onto physical scenes using projectors, creating immersive experience without head-mounted displays. However, for SAR to intelligent interaction, such as reasoning about the scene or answering user queries, it must semantically distinguish between the physical scene and the projected content. Standard Vision Language Models (VLMs) struggle with this virtual-physical ambiguity, often confusing the two contexts. To address this issue, we