SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

ArXi:2604.25472v1 Announce Type: new The need to evaluate instructional materials for K-12 science education has become increasingly important, as educators use generative AI to create instructional materials. However, the review of instructional materials is time-consuming, expertise-intensive, and difficult to scale, motivating interest in automated evaluation approaches. While large language models (LLMs) have shown strong performance on general evaluation tasks, their performance and reliability on instructional materials remain unclear.