Latent Video Prediction Learns Better World Models

ArXi:2605.15618v1 Announce Type: cross Self-supervised video models are increasingly framed as world models, yet their evaluation remains largely confined to a single top-1 accuracy score on clean benchmarks. This leaves a major gap in comprehending their potential as world models.