AI RESEARCH

TIE: Time Interval Encoding for Video Generation over Events

arXiv CS.CV

ArXi:2605.10543v1 Announce Type: new Director-style prompting, robotic action prediction, and interactive video agents demand temporal grounding over concurrent events -- a regime in which 68% of general clips and over 99% of robotics/gameplay clips contain overlapping events, yet existing multi-event generators rest on a single-active-prompt assumption. However, modern video generators, such as Diffusion Transformers (DiT), represent time as discrete points through point-wise positional encodings.