Are Agents Ready to Teach? A Multi-Stage Benchmark for Real-World Teaching Workflows

ArXi:2605.14322v1 Announce Type: new Language agents are increasingly deployed in complex professional workflows, with tutoring emerging as a particularly high-stakes capability that remains largely unmeasured in existing benchmarks. Effective tutor agents require than producing correct answers or executing accurate tool calls: a robust tutor must diagnose learner state, adapt over time, make pedagogically justified decisions grounded in educational evidence, and execute interventions within realistic learning-management systems. We.