AI RESEARCH
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading
arXiv CS.AI
•
ArXi:2410.21316v2 Announce Type: replace-cross Transformers and large language models~(LLMs) have seen rapid adoption in all domains. Their sizes have exploded to hundreds of billions of parameters and keep increasing. Under these circumstances, the