AI RESEARCH

SPAR: Single-Pass Any-Resolution ViT for Open-vocabulary Segmentation

arXiv CS.CV

ArXi:2604.02252v1 Announce Type: new Foundational Vision Transformers (ViTs) have limited effectiveness in tasks requiring fine-grained spatial understanding, due to their fixed pre-