GTASA: Ground Truth Annotations for Spatiotemporal Analysis, Evaluation and Training of Video Models

ArXi:2604.10385v1 Announce Type: new Generating complex multi-actor scenario videos remains difficult even for state-of-the-art neural generators, while evaluating them is hard due to the lack of ground truth for physical plausibility and semantic faithfulness. We