OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

ArXi:2605.11629v1 Announce Type: new Recent multimodal large language models (MLLMs) have shown strong chain-of-thought (CoT) reasoning ability on vision-language tasks, but their direct deployment in real-world systems is often limited by latency and resource constraints. In practice, smaller MLLMs are preferred for online serving, yet their reasoning performance is bottlenecked by the lack of large-scale, high-quality multimodal CoT supervision.