AI RESEARCH

Phrase-Instance Alignment for Generalized Referring Segmentation

arXiv CS.LG

ArXi:2411.15087v2 Announce Type: replace-cross Generalized Referring expressions can describe one object, several related objects, or none at all. Existing generalized referring segmentation (GRES) models treat all cases alike, predicting a single binary mask and ignoring how linguistic phrases correspond to distinct visual instances. To this end, we reformulate GRES as an instance-level reasoning problem, where the model first predicts multiple instance-aware object queries conditioned on the referring expression, then aligns each with its most relevant phrase.