ProfVLM: A lightweight video-language model for multi-view proficiency estimation

ArXi:2509.26278v4 Announce Type: replace-cross Most existing approaches formulate action quality assessment and skill proficiency estimation as discriminative prediction tasks, typically producing discrete labels or scores without explicitly modeling the reasoning process underlying the assessment. We instead reformulate the problem as generative vision-language modeling,