Inference Energy and Latency in AI-Mediated Education: A Learning-per-Watt Analysis of Edge and Cloud Models

ArXi:2603.20223v1 Announce Type: cross Immediate feedback is a foundational requirement of effective AI-mediated learning, yet the energy and latency costs of delivering it remain largely unexamined. This study investigates the latency-energy-learning trade-off in AI tutoring through an empirical comparison of two on-device inference configurations of Microsoft Phi-3 Mini (4k-instruct) on an NVIDIA T4 GPU: full-precision FP16 and 4-bit NormalFloat (NF4) quantisation.