AI RESEARCH
SaSaSaSa2VA: 2nd Place of the 5th PVUW MeViS-Text Track
arXiv CS.CV
•
ArXi:2603.27241v1 Announce Type: new Referring video object segmentation (RVOS) commonly grounds targets in videos based on static textual cues. MeViS benchmark extends this by incorporating motion-centric expressions (referring & reasoning motion expressions) and