GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models

ArXi:2511.22125v2 Announce Type: replace Visual and textual soft prompt tuning can effectively improve the adaptability of Vision-Language Models (VLMs) in downstream tasks. However, fine-tuning on video tasks impairs the model's generalization ability to unseen classes. Existing methods attempt to mitigate this forgetting effect by regularizing the gap between hand-crafted prompts and soft prompts, but this also weakens the learning ability of soft prompts.