Catastrophic forgetting is quietly killing local LLM fine-tuning, anyone else hitting this wall?

Catastrophic forgetting remains a persistent challenge when performing sequential or multi-task fine-tuning on LLMs. Models often lose significant capability on previous tasks or general knowledge as they adapt to new domains (medical, legal, code, etc.). This seems rooted in the fundamental way gradient-based optimization works and new updates overwrite earlier representations without any explicit separation between fast learning and long-term consolidation.