A Dialogue-Based Framework for Correcting Multimodal Errors in AI-Assisted STEM Education

ArXi:2605.04131v1 Announce Type: cross Large Language Models (LLMs) are cratizing access to personalized tutoring; however, their effectiveness is hindered by challenges in processing multimodal content, which limits AI's potential to provide equitable, high-quality STEM. This study evaluates LLM performance on multimodal physics problems, identifies specific failure modes through an empirical error taxonomy, and tests practical interventions designed to overcome multimodal processing limitations.