AI RESEARCH

UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models

arXiv CS.CV

ArXi:2512.11336v2 Announce Type: replace With the advancement of multi-modal Large Language Models (LLMs), Video LLMs have been further developed to perform on holistic and specialized video understanding. However, existing works are limited to specialized video understanding tasks, failing to achieve a comprehensive and multi-grained video perception. To bridge this gap, we