BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models

ArXi:2603.14761v1 Announce Type: new Large language models (LLMs) achieve impressive scores on standard benchmarks yet routinely fail questions that any human would answer correctly in seconds. We