On Geometric Understanding and Learned Priors in Feed-forward 3D Reconstruction Models

ArXi:2512.11508v2 Announce Type: replace Feed-forward 3D reconstruction models such as DUSt3R, VGGT, and Depth Anything 3 (DA3) are transformer-based foundation models that infer camera geometry and dense scene structure in a single forward pass. Trained at scale in a supervised fashion, they raise a central question: do these models build upon geometric principles akin to traditional multi-view pipelines, or do they primarily rely on learned priors arising from the large-scale