Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation

ArXi:2604.09088v1 Announce Type: new Memory-efficient transfer learning (METL) approaches have recently achieved promising performance in adapting pre-trained models to downstream tasks. They avoid applying gradient backpropagation in large backbones, thus significantly reducing the number of trainable parameters and high memory consumption during fine-tuning. However, since they typically employ a lightweight and learnable side network, these methods inevitably