MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

ArXi:2604.27393v1 Announce Type: new Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation.