AI RESEARCH
ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference
arXiv CS.LG
•
ArXi:2603.14549v1 Announce Type: cross While Large Vision-Language Models (LVLMs) nstrate exceptional multi-modal capabilities, the quadratic computational cost of processing high-resolution visual tokens remains a critical bottleneck. Though recent token reduction strategies attempt to accelerate inference, such methods inadequately exploit attention values and fail to address token redundancy. critically, they overlook the ``attention shift'' phenomenon inherent in LVLMs, which skews token attention scores. In this work, we propose ASAP, a novel.