AI RESEARCH

S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images

arXiv CS.CV

ArXi:2604.21409v1 Announce Type: new We present S1-VL, a multimodal reasoning model for scientific domains that natively s two complementary reasoning paradigms: Scientific Reasoning, which relies on structured chain-of-thought, and Thinking-with-Images, which enables the model to actively manipulate images through Python code execution during reasoning. In the Thinking-with-Images mode, the model generates and executes image-processing code in a sandbox environment, obtains intermediate visual results, and continues reasoning in a multi-turn iterative manner.