STARK: Spatio-Temporal Attention for Representation of Keypoints for Continuous Sign Language Recognition

ArXi:2603.16163v1 Announce Type: cross Continuous Sign Language Recognition (CSLR) is a crucial task for understanding the languages of deaf communities. Contemporary keypoint-based approaches typically rely on spatio-temporal encoding, where spatial interactions among keypoints are modeled using Graph Convolutional Networks or attention mechanisms, while temporal dynamics are captured using 1D convolutional networks. However, such designs often