TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation

ArXi:2604.19473v1 Announce Type: new Generating high-quality videos from complex temporal descriptions that contain multiple sequential actions is a key unsolved problem. Existing methods are constrained by an inherent trade-off: using multiple short prompts fed sequentially into the model improves action fidelity but compromises temporal consistency, while a single complex prompt preserves consistency at the cost of prompt-following capability.