FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

ArXi:2604.03893v1 Announce Type: new Breakthroughs in frontier theory often depend on the combination of concrete diagrammatic notations with rigorous logic. While multimodal large language models (MLLMs) show promise in general scientific tasks, current benchmarks often focus on local information extraction rather than the global structural logic inherent in formal scientific notations. In this work, we