StereoVGGT: A Training-Free Visual Geometry Transformer for Stereo Vision

ArXi:2603.29368v1 Announce Type: new Driven by the advancement of 3D devices, stereo vision tasks including stereo matching and stereo conversion have emerged as a critical research frontier. Contemporary stereo vision backbones typically rely on either monocular depth estimation (MDE) models or visual foundation models (VFMs). Crucially, these models are predominantly pretrained without explicit supervision of camera poses.