AI RESEARCH

Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models

arXiv CS.CV

ArXi:2508.06038v3 Announce Type: replace Vision-Language Models (VLMs) incur substantial computational overhead and inference latency due to the large number of vision tokens