Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs

ArXi:2509.08016v2 Announce Type: replace Video Large Language Models (VideoLLMs) face a critical bottleneck: increasing the number of input frames to capture fine-grained temporal detail leads to prohibitive computational costs and performance degradation from long context lengths. We