EC-Bench: Enumeration and Counting Benchmark for Ultra-Long Videos

ArXi:2603.29943v1 Announce Type: new Counting in long videos remains a fundamental yet underexplored challenge in computer vision. Real-world recordings often span tens of minutes or longer and contain sparse, diverse events, making long-range temporal reasoning particularly difficult. However, most existing video counting benchmarks focus on short clips and evaluate only the final numerical answer, providing little insight into what should be counted or whether models consistently identify relevant instances across time. We