AI RESEARCH
MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition
arXiv CS.AI
•
ArXi:2604.16009v1 Announce Type: new Metacognition, the ability to monitor and regulate one's own reasoning, remains under-evaluated in AI benchmarking. We