AI RESEARCH

Streaming 4D Visual Geometry Transformer

arXiv CS.AI

ArXi:2507.11539v2 Announce Type: replace-cross Perceiving and reconstructing 3D geometry from videos is a fundamental yet challenging computer vision task. To facilitate interactive and low-latency applications, we propose a streaming visual geometry transformer that shares a similar philosophy with autoregressive large language models. We explore a simple and efficient design and employ a causal transformer architecture to process the input sequence in an online manner.