MOOSE-Star (ICML 2026): 7B model + 108K-paper dataset for scientific hypothesis discovery
r/LocalLLaMA
•
Open Source AI
AI Research
AI Tools
Disclosure first: I work on community at MiroMind. One of our researchers just dropped the full MOOSE-Star collection on Hugging Face - a 7B model post-trained for scientific hypothesis discovery, plus the dataset behind it. Paper accepted at ICML 2026. 🤗 Collection: Inside: MS-IR-7B / MS-HC-7B / MS-7B: 7B models for inspiration retrieval, hypothesis composition, and joint use. Base: DeepSeek-R1-Distill-Qwen-7B. TOMATO-Star: 108,717 NCBI papers decomposed into (background, hypothesis, inspirations), every inspiration anchored to a real citation.