LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

ArXi:2603.19217v1 Announce Type: new Recent advancements in omnimodal large language models (OmniLLMs) have significantly improved the comprehension of audio and video inputs. However, current evaluations primarily focus on short audio and video clips ranging from 10 seconds to 5 minutes, failing to reflect the demands of real-world applications, where videos typically run for tens of minutes. To address this critical gap, we