MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

ArXi:2605.08678v1 Announce Type: new Modern AI progress has been driven by ML methods that are generalizable across settings and scalable to larger regimes. As large language models nstrate advanced capabilities in reasoning, coding, and engineering tasks, it is increasingly important to understand whether they can discover such methods rather than only apply existing ones. We