SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding

ArXi:2605.14923v1 Announce Type: new General scene perception has progressed from object recognition toward open-vocabulary grounding, part localization, and affordance prediction. Yet these capabilities are often realized as isolated predictions that localize objects, parts, or interaction points without capturing the structured dependencies needed for interaction-oriented scene understanding. To address this gap, we