FVG-PT: Adaptive Foreground View-Guided Prompt Tuning for Vision-Language Models

ArXi:2603.08708v1 Announce Type: new CLIP-based prompt tuning enables pretrained Vision-Language Models (VLMs) to efficiently adapt to downstream tasks. Although existing studies have made significant progress, they pay limited attention to changes in the internal attention representations of VLMs during the tuning process.