Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners

ArXi:2604.22542v1 Announce Type: cross Large language models (LLMs) often fail to meet the pedagogical needs of K-12 English learners in non-native contexts due to a proficiency mismatch. To address this widespread challenge, we Our core technical contribution is the \textbf{DDPO} algorithm,Diversity Driven Policy Optimization, a multi-turn GRPO-based approach designed to preserve dialogue diversity while holistically optimizing dialogue quality.