AI RESEARCH

SaSaSaSa2VA: 2nd Place of the 5th PVUW MeViS-Text Track

arXiv CS.CV

ArXi:2603.27241v1 Announce Type: new Referring video object segmentation (RVOS) commonly grounds targets in videos based on static textual cues. MeViS benchmark extends this by incorporating motion-centric expressions (referring & reasoning motion expressions) and