ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models

ArXi:2511.18082v2 Announce Type: replace Recent Vision-Language-Action (VLA) models have shown impressive flexibility and generalization, yet their deployment in robotic manipulation remains limited by heavy computational overhead and inference latency. In this work, we present ActDistill, a general action-guided self-derived distillation framework that transfers the action prediction capability of any existing VLA model to a lightweight counterpart.