OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

ArXi:2603.12265v1 Announce Type: new Modern visual agents require representations that are general, causal, and physically structured to operate in real-time streaming environments. However, current vision foundation models remain fragmented, specializing narrowly in image semantic perception, offline temporal modeling, or spatial geometry. This paper