WildDet3D: Scaling Promptable 3D Detection in the Wild

ArXi:2604.08626v1 Announce Type: new Understanding objects in 3D from a single image is a cornerstone of spatial intelligence. A key step toward this goal is monocular 3D object detection--recovering the extent, location, and orientation of objects from an input RGB image. To be practical in the open world, such a detector must generalize beyond closed-set categories, diverse prompt modalities, and leverage geometric cues when available.