POINTS-Long: Adaptive Dual-Mode Visual Reasoning in MLLMs

ArXi:2604.11627v1 Announce Type: new Multimodal Large Language Models (MLLMs) have recently nstrated remarkable capabilities in cross-modal understanding and generation. However, the rapid growth of visual token sequences--especially in long-video and streaming scenarios--poses a major challenge to their scalability and real-world deployment. Thus, we