AI RESEARCH
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
arXiv CS.LG
•
ArXi:2603.16127v1 Announce Type: cross We investigate the role of learning rate scheduling in the large-scale pre-