This new benchmark could expose AI’s biggest weakness

The influential AI researcher François Chollet has long argued that the field measures intelligence incorrectly, that popular benchmarks reward a model’s ability to memorize vast amounts of data rather than navigate novel situations and learn new skills. Only recently, with the rise of autonomous AI agents, have companies begun to take that critique seriously. On Tuesday, the ARC Prize Foundation, which Chollet founded with Zapier cofounder Mike Knoop, released a new and difficult version of its benchmark.