MMSpec: Benchmarking Speculative Decoding for Vision-Language Models

ArXi:2603.14989v1 Announce Type: new Vision-language models (VLMs) achieve strong performance on multimodal tasks but suffer from high inference latency due to large model sizes and long multimodal contexts. Speculative decoding has recently emerged as an effective acceleration technique, yet its behavior in VLMs remains insufficiently understood. We