StructSAM: Structure- and Spectrum-Preserving Token Merging for Segment Anything Models

ArXi:2603.07307v1 Announce Type: cross Recent token merging techniques for Vision Transformers (ViTs) provide substantial speedups by reducing the number of tokens processed by self-attention, often without re