AI RESEARCH
SPAR: Single-Pass Any-Resolution ViT for Open-vocabulary Segmentation
arXiv CS.CV
•
ArXi:2604.02252v1 Announce Type: new Foundational Vision Transformers (ViTs) have limited effectiveness in tasks requiring fine-grained spatial understanding, due to their fixed pre-