AI RESEARCH

MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition

arXiv CS.AI

ArXi:2604.16009v1 Announce Type: new Metacognition, the ability to monitor and regulate one's own reasoning, remains under-evaluated in AI benchmarking. We