RAE-NWM: Navigation World Model in Dense Visual Representation Space

ArXi:2603.09241v1 Announce Type: new Visual navigation requires agents to reach goals in complex environments through perception and planning. World models address this task by simulating action-conditioned state transitions to predict future observations. Current navigation world models typically learn state evolution under actions within the compressed latent space of a Variational Autoencoder, where spatial compression often discards fine-grained structural information and hinders precise control.