Towards Compact Sign Language Translation: Frame Rate and Model Size Trade-offs

ArXi:2605.09554v1 Announce Type: cross Sign Language Translation (SLT) converts sign language videos into spoken-language text, bridging communication between Deaf and hearing communities. Current gloss-free approaches rely on large encoder-decoder models, limiting deployment. We propose a compact 77M-parameter pipeline that couples MMPose skeletal pose extraction with a single linear projection into T5-small.