Frequency-Modulated Visual Restoration for Matryoshka Large Multimodal Models

ArXi:2603.11220v1 Announce Type: cross Large Multimodal Models (LMMs) struggle to adapt varying computational budgets due to numerous visual tokens. Previous methods attempted to reduce the number of visual tokens before or within LLMs. However, these strategies inevitably result in the loss of visual semantic. To address these issues, we