SBF: An Effective Representation to Augment Skeleton for Video-based Human Action Recognition

ArXi:2604.03590v1 Announce Type: new Many modern video-based human action recognition (HAR) approaches use 2D skeleton as the intermediate representation in their prediction pipelines. Despite overall encouraging results, these approaches still struggle in many common scenes, mainly because the skeleton does not capture critical action-related information pertaining to the depth of the joints, contour of the human body, and interaction between the human and objects.