AcademiClaw: The Benchmark Where Even the Best AI Agents Flunk 45% of Real Student Work
Towards AI
•
Generative AI
AI Research
80 Real Student Tasks Reveal a 55% Ceiling, a Token-Quality Disconnect, and Three Distinct Ways AI Agents Fail