UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation

ArXi:2603.23478v1 Announce Type: new Functionality segmentation in 3D scenes requires an agent to ground implicit natural-language instructions into precise masks of fine-grained interactive elements. Existing methods rely on fragmented pipelines that suffer from visual blindness during initial task parsing. We observe that these methods are limited by single-scale, passive and heuristic frame selection. We present UniFunc3D, a unified and