Graceful Forgetting in Generative Language Models

ArXi:2505.19715v2 Announce Type: replace-cross Recently, the pretrain-finetune paradigm has become a cornerstone in various deep learning areas. While in general the pre-trained model would promote both effectiveness and efficiency of downstream tasks fine-tuning, studies have shown that not all knowledge acquired during pre-