PushupBench: Your VLM is not good at counting pushups

ArXi:2604.23407v1 Announce Type: new Large vision-language models (VLMs) can recognize \textit{what} happens in video but fail to count \textit{how many} times. We