The Frog Who Aced the Benchmark

Towards AI
AI Research

From Zhuangzi to modern AI benchmarks - why a perfect score inside the well tells you nothing about the ocean. The Frog at the Bottom of the Well - 井底之蛙 The AI scored 98% on the test. Researchers celebrated. It was deployed. Then someone asked it a question it had never seen before - from a domain outside its