AI RESEARCH

WorldJen: An End-to-End Multi-Dimensional Benchmark for Generative Video Models

arXiv CS.CV

ArXi:2605.03475v1 Announce Type: new Evaluating generative video models remains an open problem. Reference-based metrics such as Structural Similarity Index Measure (SSIM) and Peak Signal to Noise Ratio (PSNR) reward pixel fidelity over semantic correctness, while Frechet Video Distance (FVD) favors distributional textures over physical plausibility. Binary Visual Question Answering (VQA) based benchmarks like VBench~2.0 are prone to yes-bias and rely on low-resolution auditors that miss temporal failures.