AI RESEARCH
Towards Temporal Compositional Reasoning in Long-Form Sports Videos
arXiv CS.CV
•
ArXi:2604.22226v1 Announce Type: new Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-horizon reasoning in sports videos remains difficult, as answering questions requires both locating temporally sparse evidence and integrating it into reasoning.