V-CAST: Video Curvature-Aware Spatio-Temporal Pruning for Efficient Video Large Language Models

ArXi:2603.27650v1 Announce Type: new Video large language models (VideoLLMs) show strong capability in video understanding, yet long-context inference is still dominated by massive redundant visual tokens in the prefill stage. We revisit token compression for VideoLLMs under a tight budget and identify a key bottleneck, namely insufficient spatio-temporal information coverage. Existing methods often