Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models

ArXi:2603.16001v1 Announce Type: cross Network pruning is an effective technique for enabling lightweight Large Vision-Language Models (LVLMs), which primarily incorporates both weights and activations into the importance metric. However, existing efforts typically process calibration data from different modalities in a unified manner, overlooking modality-specific behaviors. This raises a critical challenge: how to address the divergent behaviors of textual and visual tokens for accurate pruning of LVLMs.