AI RESEARCH
BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models
arXiv CS.AI
•
ArXi:2603.14761v1 Announce Type: new Large language models (LLMs) achieve impressive scores on standard benchmarks yet routinely fail questions that any human would answer correctly in seconds. We