EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation

ArXi:2603.18739v1 Announce Type: new Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pre.