AI RESEARCH
TrainMover: An Interruption-Resilient Runtime for ML Training
arXiv CS.AI
•
ArXi:2412.12636v3 Announce Type: replace-cross Large-scale ML