UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models

ArXi:2512.11336v2 Announce Type: replace With the advancement of multi-modal Large Language Models (LLMs), Video LLMs have been further developed to perform on holistic and specialized video understanding. However, existing works are limited to specialized video understanding tasks, failing to achieve a comprehensive and multi-grained video perception. To bridge this gap, we