FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios

ArXi:2604.07413v1 Announce Type: new The manufacturing sector is increasingly adopting Multimodal Large Language Models (MLLMs) to transition from simple perception to autonomous execution, yet current evaluations fail to reflect the rigorous demands of real-world manufacturing environments. Progress is hindered by data scarcity and a lack of fine-grained domain semantics in existing datasets. To bridge this gap, we