We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback

ArXi:2504.17180v3 Announce Type: replace-cross Current text-to-video (T2V) generation models are increasingly popular due to their ability to produce coherent videos from textual prompts. However, these models often struggle to generate semantically and temporally consistent videos when dealing with longer, complex prompts involving multiple objects or sequential events. Additionally, the high computational costs associated with