Xuanwu: Evolving General Multimodal Models into an Industrial-Grade Foundation for Content Ecosystems

ArXi:2603.29211v1 Announce Type: new In recent years, multimodal large models have continued to improve on general benchmarks. However, in real-world content moderation and adversarial settings, mainstream models still suffer from degraded generalization and catastrophic forgetting because of limited fine-grained visual perception and insufficient modeling of long-tail noise. In this paper, we present Xuanwu VL-2B as a of how general multimodal models can be developed into an industrial-grade foundation model for content ecosystems.