Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

ArXi:2511.08225v2 Announce Type: replace-cross As teachers increasingly turn to GenAI in their educational practice, we need robust methods to benchmark large language models (LLMs) for pedagogical purposes. This article presents an embedding-based benchmarking framework to detect bias in LLMs in the context of formative feedback.