Adversarial Video Promotion Against Text-to-Video Retrieval

ArXi:2508.06964v3 Announce Type: replace Thanks to the development of cross-modal models, text-to-video retrieval (T2VR) is advancing rapidly, but its robustness remains largely unexamined. Existing attacks against T2VR are designed to push videos away from queries, i.e., suppressing the ranks of videos, while the attacks that pull videos towards selected queries, i.e., promoting the ranks of videos, remain largely unexplored. These attacks can be impactful as attackers may gain views/clicks for financial benefits and widespread (mis)information.