AI RESEARCH
Tango: Taming Visual Signals for Efficient Video Large Language Models
arXiv CS.CV
•
ArXi:2604.09547v1 Announce Type: new Token pruning has emerged as a mainstream approach for developing efficient Video Large Language Models (Video LLMs). This work revisits and advances the two predominant token-pruning paradigms: attention-based selection and similarity-based clustering.