MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts

ArXi:2604.06505v1 Announce Type: new Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific