VirtueBench: Evaluating Trustworthiness under Uncertainty in Long Video Understanding

ArXi:2603.07071v1 Announce Type: new Recent Vision-Language Models (VLMs) have made remarkable progress in multimodal understanding tasks, yet their evaluation on long video understanding remains unreliable. Due to limited frame inputs, key frames necessary for answering the question may be missing from the model's input.