TAP into the Patch Tokens: Leveraging Vision Foundation Model Features for AI-Generated Image Detection

ArXi:2604.26772v1 Announce Type: new Recent methods nstrate that large-scale pretrained models, such as CLIP vision transformers, effectively detect AI-generated images (AIGIs) from unseen generative models when used as feature extractors. Many state-of-the-art methods for AI-generated image detection build upon the original CLIP-ViT to enhance this generalization. Since CLIP's release, numerous vision foundation models (VFMs) have emerged, incorporating architectural improvements and different.