Edit2Interp: Adapting Image Foundation Models from Spatial Editing to Video Frame Interpolation with Few-Shot Learning

ArXi:2603.15003v1 Announce Type: new Pre-trained image editing models exhibit strong spatial reasoning and object-aware transformation capabilities acquired from billions of image-text pairs, yet they possess no explicit temporal modeling. This paper nstrates that these spatial priors can be repurposed to unlock temporal synthesis capabilities through minimal adaptation - without