Curvature-Aware Captioning:Leveraging Geodesic Attention for 3D Scene Understanding

ArXi:2605.08808v1 Announce Type: cross Accurate 3D scene description is fundamental to robotic navigation and augmented reality, yet current dense captioning methods face significant limitations in processing sparse point cloud data. % Existing approaches that apply Euclidean embedding spaces struggle to simultaneously preserve fine-grained local geometric details and model exponentially growing global semantic hierarchies, leading to either inaccurate localization or disjointed, shallow scene descriptions.