AI RESEARCH

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

arXiv CS.CL

ArXi:2605.09063v2 Announce Type: replace Following the recent achievement of gold-medal performance on the IMO by frontier LLMs, the community is searching for the next meaningful and challenging target for measuring LLM reasoning. Whereas olympiad-style problems measure step-by-step reasoning alone, research-level problems use such reasoning to advance the frontier of mathematical knowledge itself, emerging as a compelling alternative.