Native Audio rendering in vids not as important as you think
r/StableDiffusion
•
Open Source AI
Used Olivio's tutorial for this. and I realized, unless the clip you need is isolated in just a few seconds and you use it entirely. for the most part; video models having audio is kinda. useless. if you have to cut / edit the video. the source audios from each edited clip disrupts the narrative flow. You end up having to make your own audio clips anyway. almost everything here was generated in Vibevoice and Qwen TTS in comfyui. the videos were using Seedance 2 / Kling/ LTX 2.3. the original car model was made with flux 2 Klein and then cleaned up with nano banana via the.