AI RESEARCH
EROICA: Online Performance Troubleshooting for Large-scale Model Training
arXiv CS.LG
•
ArXi:2506.08528v4 Announce Type: replace-cross Troubleshooting performance problems of large model