DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework

ArXi:2506.05199v3 Announce Type: replace A core task in embodied intelligence is ego-centric 3D visual grounding. Existing methods typically adopt two-stage, heterogeneous pipelines that pair a detector with a separate grounding model. Incompatible decoders and box heads hinder the transfer of object-level priors, and the split