Split-on-Share: Mixture of Sparse Experts for Task-Agnostic Continual Learning

ArXi:2601.17616v2 Announce Type: replace Continual learning in Large Language Models (LLMs) is hindered by the plasticity-stability dilemma, where acquiring new capabilities often leads to catastrophic forgetting of previous knowledge. Existing methods typically treat parameters uniformly, failing to distinguish between specific task knowledge and shared capabilities. We