Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs

ArXi:2604.20937v1 Announce Type: new Video Large Language Models (Video LLMs) incur high inference latency due to a large number of visual tokens provided to LLMs. To address this