Video Panels for Long Video Understanding

ArXi:2509.23724v2 Announce Type: replace Recent Video-Language Models (VLMs) achieve promising results on long-video understanding, but their performance still lags behind that achieved on tasks involving images or short videos. This has led to great interest in improving the long context modeling of VLMs by