LPT: Less-overfitting Prompt Tuning for Vision-Language Model

ArXi:2410.10247v3 Announce Type: replace-cross Vision-language models (VLMs) have nstrated exceptional generalization capabilities for downstream tasks. Due to its efficiency, prompt learning has gradually become a effective and efficient method for transferring VLMs to downstream tasks, surpassing traditional finetuning methods. However, during the transfer process, these models are prone to severe overfitting, leading to a significant decline in generalization ability. To address this issue, we propose a framework named LPT, specifically designed for vision-language models.