FA-Seg: A Fast and Accurate Diffusion-Based Method for Open-Vocabulary Segmentation

ArXi:2506.23323v4 Announce Type: replace Open-vocabulary semantic segmentation (OVSS) aims to segment objects from arbitrary text categories without requiring densely annotated datasets. Although contrastive learning based models enable zero-shot segmentation, they often lose fine spatial precision at pixel level, due to global representation bias. In contrast, diffusion-based models naturally encode fine-grained spatial features via attention mechanisms that capture both global context and local details.