MOSA: Motion-Guided Semantic Alignment for Dynamic Scene Graph Generation

ArXi:2604.19631v1 Announce Type: new Dynamic Scene Graph Generation (DSGG) aims to structurally model objects and their dynamic interactions in video sequences for high-level semantic understanding. However, existing methods struggle with fine-grained relationship modeling, semantic representation utilization, and the ability to model tail relationships. To address these issues, this paper proposes a motion-guided semantic alignment method for DSGG (MoSA