SMFormer: Empowering Self-supervised Stereo Matching via Foundation Models and Data Augmentation

ArXi:2604.10218v1 Announce Type: new Recent self-supervised stereo matching methods have made significant progress. They typically rely on the photometric consistency assumption, which presumes corresponding points across views share the same appearance. However, this assumption could be compromised by real-world disturbances, resulting in invalid supervisory signals and a significant accuracy gap compared to supervised methods. To address this issue, we propose SMFormer, a framework integrating reliable self-supervision guided by the Vision Foundation Model (VFM) and data augmentation.