Catastrophic forgetting is quietly killing local LLM fine-tuning, anyone else hitting this wall?

r/artificial
Machine Learning Generative AI AI Research

Catastrophic forgetting remains a persistent challenge when performing sequential or multi-task fine-tuning on LLMs. Models often lose significant capability on previous tasks or general knowledge as they adapt to new domains (medical, legal, code, etc.). This seems rooted in the fundamental way gradient-based optimization works and new updates overwrite earlier representations without any explicit separation between fast learning and long-term consolidation.