ZIT + LLM + AceStep + Wan + InfiniteTalk

r/StableDiffusion
Generative AI

Local Only ZIT for Image + LLM for Lyrics + AceStep 1.5 for Song + Wan 2.1 for Animation + InfiniteTalk for Lip-syncing All only using their standard workflow. Kept resolution low to fit in VRAM/RAM. Prompt (Video): " a woman is singing emotionally. highly expressive gestures, moving hands while singing, performing on stage. " submitted by /u/ZerOne82 [link] [comments]