TrajTok: Learning Trajectory Tokens enables better Video Understanding

ArXi:2602.22779v2 Announce Type: replace Tokenization in video models, typically through patchification, generates an excessive and redundant number of tokens. This severely limits video efficiency and scalability. While recent trajectory-based tokenizers offer a promising solution by decoupling video duration from token count, they rely on complex external segmentation and tracking pipelines that are slow and task-agnostic.