AI RESEARCH
ViBe: Ultra-High-Resolution Video Synthesis Born from Pure Images
arXiv CS.CV
•
ArXi:2603.23326v1 Announce Type: new Transformer-based video diffusion models rely on 3D attention over spatial and temporal tokens, which incurs quadratic time and memory complexity and makes end-to-end