VEBench:Benchmarking Large Multimodal Models for Real-World Video Editing

ArXi:2605.03276v1 Announce Type: new Real-world video editing demands not only expert knowledge of cinematic techniques but also multimodal reasoning to select, align, and combine footage into coherent narratives. While recent Large Multimodal Models (LMMs) have shown remarkable progress in general video understanding, their abilities in multi-video reasoning and operational editing workflows remain largely unexplored. We