QEVA: A Reference-Free Evaluation Metric for Narrative Video Summarization with Multimodal Question Answering

ArXi:2604.24052v1 Announce Type: new Video-to-text summarization remains underexplored in terms of comprehensive evaluation methods. Traditional n-gram overlap-based metrics and recent large language model (LLM)-based approaches depend heavily on human-written reference summaries, limiting their practicality and sensitivity to nuanced semantic aspects. In this paper, we propose QEVA, a reference-free metric evaluating candidate summaries directly against source videos through multimodal question answering.