QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models

ArXi:2604.02816v1 Announce Type: cross Multimodal Large Language Models (MLLMs) have shown strong reasoning ability, but their high computational and memory costs hinder deployment in resource-constrained settings. While Post-