AI RESEARCH
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
Apple Machine Learning Research
•
While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pre