AI RESEARCH
CARES: Context-Aware Resolution Selector for VLMs
arXiv CS.AI
•
ArXi:2510.19496v2 Announce Type: replace-cross Large vision-language models (VLMs) commonly process images at native or high resolution to remain effective across tasks. This inflates visual tokens ofter to 97-99% of total tokens, resulting in high compute and latency, even when low-resolution images would suffice. We