VisMMOE: Exploiting Visual-Expert Affinity for Efficient Visual-Language MoE Offloading

ArXi:2605.05899v1 Announce Type: new Large-scale vision-language mixture-of-experts (VL-MoE) models provide strong multimodal capability, but efficient deployment on memory-constrained platforms remains difficult. Existing MoE offloading systems are largely designed for text-centric workloads and become much less effective for visual-heavy inputs, where large numbers of visual tokens induce broader and less predictable expert accesses.