AI RESEARCH
MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts
arXiv CS.CL
•
ArXi:2604.06505v1 Announce Type: new Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific