vGamba: Attentive State Space Bottleneck for efficient Long-range Dependencies in Visual Recognition

ArXi:2503.21262v3 Announce Type: replace Capturing long-range dependencies (LRD) efficiently is a core challenge in visual recognition, and state-space models (SSMs) have recently emerged as a promising alternative to self-attention for addressing it. However, adapting SSMs into CNN-based bottlenecks remains challenging, as existing approaches require complex pre-processing and multiple SSM replicas per block, limiting their practicality.