AI RESEARCH
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
arXiv CS.AI
•
ArXi:2505.23617v3 Announce Type: replace-cross Effective video tokenization is critical for scaling transformer models for long videos. Current approaches tokenize videos using space-time patches, leading to excessive tokens and computational inefficiencies. The best token reduction strategies degrade performance and barely reduce the number of tokens when the camera moves. We