PCA-Seg: Revisiting Cost Aggregation for Open-Vocabulary Semantic and Part Segmentation

ArXi:2603.17520v1 Announce Type: new Recent advances in vision-language models (VLMs) have garnered substantial attention in open-vocabulary semantic and part segmentation (OSPS). However, existing methods extract image-text alignment cues from cost volumes through a serial structure of spatial and class aggregations, leading to knowledge interference between class-level semantics and spatial context.