Confidence Estimation in Automatic Short Answer Grading with LLMs

ArXi:2605.00200v1 Announce Type: new Automatic Short Answer Grading (ASAG) with generative large language models (LLMs) has recently nstrated strong performance without task-specific fine-tuning, while also enabling the generation of synthetic feedback for educational assessment. Despite these advances, LLM-based grading remains imperfect, making reliable confidence estimates essential for safe and effective human-AI collaboration in educational decision-making.