AcademiClaw: The Benchmark Where Even the Best AI Agents Flunk 45% of Real Student Work

Towards AI
Generative AI AI Research

80 Real Student Tasks Reveal a 55% Ceiling, a Token-Quality Disconnect, and Three Distinct Ways AI Agents Fail