Tango: Taming Visual Signals for Efficient Video Large Language Models

ArXi:2604.09547v1 Announce Type: new Token pruning has emerged as a mainstream approach for developing efficient Video Large Language Models (Video LLMs). This work revisits and advances the two predominant token-pruning paradigms: attention-based selection and similarity-based clustering.