AI RESEARCH

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

arXiv CS.CV

ArXi:2412.04446v2 Announce Type: replace Videos are inherently temporal sequences by their very nature. In this work, we explore the potential of modeling videos in a chronological and scalable manner with autoregressive (AR) language models, inspired by their success in natural language processing. We