Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

ArXi:2604.11707v1 Announce Type: new Accurate future video prediction requires both high visual fidelity and consistent scene semantics, particularly in complex dynamic environments such as autonomous driving. We present Re2Pix, a hierarchical video prediction framework that decomposes forecasting into two stages: semantic representation prediction and representation-guided visual synthesis.