Uncertainty-Aware and Decoder-Aligned Learning for Video Summarization

ArXi:2605.09507v1 Announce Type: new Video summarization aims to produce a compact representation of a long video by selecting a subset of temporally important segments that best reflect human preferences. This task is inherently difficult due to strong annotation subjectivity and the reliance on discrete decoding procedures, such as temporal segmentation and knapsack-based selection, during evaluation. Most existing approaches either and inference cost.