Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

ArXi:2510.24821v3 Announce Type: replace-cross We propose Ming-Flash-Omni, an upgraded version of Ming-Omni, built upon a sparser Mixture-of-Experts (MoE) variant of Ling-Flash-2.0 with 100B total parameters, of which only 6.1B are active per token. This architecture enables highly efficient scaling (dramatically improving computational efficiency while significantly expanding model capacity) and empowers stronger unified multimodal intelligence across vision, speech, and language, representing a key step toward Artificial General Intelligence.