OTPrune: Distribution-Aligned Visual Token Pruning via Optimal Transport

ArXi:2602.20205v3 Announce Type: replace Multi-modal large language models (MLLMs) achieve strong visual-language reasoning but suffer from high inference cost due to redundant visual tokens. Recent work explores visual token pruning to accelerate inference, while existing pruning methods overlook the underlying distributional structure of visual representations. We propose OTPrune, a