ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition

ArXi:2503.21248v3 Announce Type: replace Large language models (LLMs) have shown potential in assisting scientific research, yet their ability to discover high-quality research hypotheses remains unexamined due to the lack of a dedicated benchmark. To address this gap, we