Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment

ArXi:2604.19781v1 Announce Type: cross Automated scoring of student work at scale requires balancing accuracy against cost and latency. In "cascade" systems, small language models (LMs) handle easier scoring tasks while escalating harder ones to larger LMs -- but the challenge is determining which cases to escalate. We explore verbalized confidence -- asking the LM to state a numerical confidence alongside its prediction -- as a routing signal.