AI RESEARCH
Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models
arXiv CS.CV
•
ArXi:2508.06038v3 Announce Type: replace Vision-Language Models (VLMs) incur substantial computational overhead and inference latency due to the large number of vision tokens