LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches

ArXi:2604.01754v1 Announce Type: cross Mathematical reasoning is a hallmark of human intelligence, and whether large language models (LLMs) can meaningfully perform it remains a central question in artificial intelligence and cognitive science. As LLMs are increasingly integrated into scientific workflows, rigorous evaluation of their mathematical capabilities becomes a practical necessity. Existing benchmarks are limited by synthetic settings and data contamination.