Human-like Object Grouping in Self-supervised Vision Transformers

ArXi:2603.13994v1 Announce Type: cross Vision foundation models trained with self-supervised objectives achieve strong performance across diverse tasks and exhibit emergent object segmentation properties. However, their alignment with human object perception remains poorly understood. Here, we