FAST3DIS: Feed-forward Anchored Scene Transformer for 3D Instance Segmentation

ArXi:2603.25993v1 Announce Type: new While recent feed-forward 3D reconstruction models provide a strong geometric foundation for scene understanding, extending them to 3D instance segmentation typically relies on a disjointed "lift-and-cluster" paradigm. Grouping dense pixel-wise embeddings via non-differentiable clustering scales poorly with the number of views and disconnects representation learning from the final segmentation objective.