AI RESEARCH
PushupBench: Your VLM is not good at counting pushups
arXiv CS.CV
•
ArXi:2604.23407v1 Announce Type: new Large vision-language models (VLMs) can recognize \textit{what} happens in video but fail to count \textit{how many} times. We