OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation

ArXi:2604.18326v1 Announce Type: new Recent advancements in audio-video joint generation models have nstrated impressive capabilities in content creation. However, generating high-fidelity human-centric videos in complex, real-world physical scenes remains a significant challenge. We identify that the root cause lies in the structural deficiencies of existing datasets across three dimensions: limited global scene and camera diversity, sparse interaction modeling (both person-person and person-object), and insufficient individual attribute alignment.