A Benchmarking Methodology to Assess Open-Source Video Large Language Models in Automatic Captioning of News Videos

ArXi:2603.27662v1 Announce Type: new News videos are among the most prevalent content types produced by television stations and online streaming platforms, yet generating textual descriptions to facilitate indexing and retrieval largely remains a manual process. Video Large Language Models (VidLLMs) offer significant potential to automate this task, but a comprehensive evaluation in the news domain is still lacking.