Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms

ArXi:2506.04822v3 Announce Type: replace Despite rapid progress in vision-language and large language models (VLMs and LLMs), their effectiveness for AI-driven educational assessment in real-world, underrepresented classrooms remains largely unexplored. We evaluate state-of-the-art VLMs and LLMs on over 14K handwritten answers from grade-4 classrooms in Indonesia, covering Mathematics and English aligned with the local national curriculum.