DynaTok: Temporally Adaptive and Positional Bias-Aware Token Compression for Video-LLMs

ArXi:2605.19322v1 Announce Type: new Recent advances in Video Large Language Models (Video-LLMs) have greatly expanded multimodal reasoning capabilities. However, the massive number of visual tokens extracted from long video sequences incurs prohibitive computational costs, limiting their deployment in real-world scenarios. Existing