RS-SSM: Refining Forgotten Specifics in State Space Model for Video Semantic Segmentation

ArXi:2603.24295v1 Announce Type: new Recently, state space models have nstrated efficient video segmentation through linear-complexity state space compression. However, Video Semantic Segmentation (VSS) requires pixel-level spatiotemporal modeling capabilities to maintain temporal consistency in segmentation of semantic objects. While state space models can preserve common semantic information during state space compression, the fixed-size state space inevitably forgets specific information, which limits the models' capability for pixel-level segmentation.