Image to Video with Song (open source)
r/StableDiffusion
•
Generative AI
This music-video was made entirely locally using open-source models as follows: ZIT for Image + LLM for Lyrics + AceStep1.5 for Song + Wan2.1 for Animation + InfiniteTalk for Lip-syncing Only the standard workflow were used. I kept the video resolution low to fit in VRAM/RAM. This whole process for this than 2m video-audio took about 1h. A woman singing The prompt for video: "a woman is singing emotionally. highly expressive gestures, moving hands while singing, performing on stage." submitted by /u/ZerOne82 [link] [comments.