ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding

ArXi:2604.06685v1 Announce Type: new While Vision-Language Models (VLMs) have nstrated significant potential in chemical visual understanding, current models are predominantly optimized for direct visual question-answering tasks. This paradigm often results in "black-box" systems that fail to utilize the inherent capability of Large Language Models (LLMs) to infer underlying reaction mechanisms. In this work, we