Improving Training Efficiency with Effective Training Time (19 minute read)

TLDR AI
AI Research AI Tools

Meta introduced Effective Training Time (ETT%) to measure how much end-to-end training runtime is spent on actual learning, highlighting overhead like checkpointing and failures. This post outlines system and PyTorch-level optimizations that reduce wasted time and improve large-scale training efficiency.