AI RESEARCH

Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

Apple Machine Learning Research

While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pre