CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models via Speculative Decoding

ArXi:2605.02218v1 Announce Type: new Vision-language models (VLMs) have nstrated strong capabilities in multimodal perception and reasoning. However, deploying large VLMs on mobile devices remains challenging due to their substantial computational and memory demands. A practical alternative is device-edge co-inference, where a lightweight draft VLM on the mobile device collaborates with a larger target VLM on the edge server via speculative decoding.