AI RESEARCH

Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning

arXiv CS.LG

ArXi:2603.16127v1 Announce Type: cross We investigate the role of learning rate scheduling in the large-scale pre-