Evaluating Large Language Models in Scientific Discovery

ArXi:2512.15567v2 Announce Type: replace Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We