AI News Leader · Topic
Image & Video Generation
The latest Image & Video Generation news, research, and analysis, continuously tracked across the AI landscape.
24 recent stories
-
JoyAI-Echo - Large Scale LTX-2.3 finetune for long form (5min) coherent stories.
Researchers at r/StableDiffusion have developed JoyAI-Echo, a large-scale language model finetuned for generating coherent, long-form stories lasting up to five minutes. This mo…
-
Why do people like flux2 klein edit so much?
Basically the title. I've played around with the edit functionality a fair bit, and it just doesn't seem that good compared to qwen image edit. It changes the lighting, distorts…
-
UPDATE NexusBTA v0.2.22 is out Ui with pre made Comfy Workflows
NexusBTA v0.2.22 is out MY WEB UI USING COMFY UI BACK WITH PRE MADE WORKFLOWS Compatible with ANIMA, WAN 2.2, LTX 2.3, SD 1.5, SDXL, ILLUSTRIOUS, PONY, FLUX, FLUX 2 KLEIN, FLUX…
-
I compared 62 samplers and 16 schedulers for Z-Image Turbo and rated the image quality so you don't have to 😉
Here's a sampler/scheduler comparison table for image generation with Z-Image Turbo. Obviously it reads like Red < Orange < Yellow < Green. You're welcome! PS. If you don't like…
-
Do you listen to your GPU?
After watching a movie/show with the kids, my PC gets switched back to my own monitor and the regular sound comes out of my speakers again, but the main speakers are still hooke…
-
Help a n00b install SD who keeps getting error messages.
And please explain what I need to do as if I'm five years old with no prior knowledge of coding or how any of this all works under the hood. I'm good at following directions, ev…
-
I tested 4 VLMs as "bad hands" detectors. Here's which one works best as a judge
We all know that hands can be hard for small models, so I tried to find the best way to detect bad hands with my setup (GX10 Spark). I though any VLM like Gemma would work, but…
-
Anima testing for complex scene
I'm always working with claude to fined the best way to write prompts and this is where I'm at right now, prompt: highres, sensitive, A wide shot from slightly below frames an a…
-
MISO-TTS . 8 Billion text2speech model released.
TTS 8B is a text-to-speech model based on the Sesame CSM architecture. It generates Mimi audio codes from text and optional audio context, using a large Llama 3.2-style backbone…
-
Pixel Diffusion Transformers for Image Generation Pixel Diffusion Transformers for Image Generation, 1.3B, no VAE
PixelDiT is a 1.3B parameter text-to-image model by NVidia with image editing capabilities. Key features: VAE-free Dual-level architecture: Patch-level DiT + Pixel-level DiT MM-…
-
1-bit Bonsai Image 4B and Ternary Bonsai Image 4B Image Generation for Devices with just 0.93 GB and 1.21 GB respectively of Diffusion Transformer Footprint. So tiny!
AI model news: 1-bit Bonsai Image 4B and Ternary Bonsai Image 4B Image Generation for Devices with just 0.93 GB and 1.21 GB respectively of Diffusion Transformer Footprint. So t…
-
New to Qwen Image Edit, it seems to fail a lot of commands, what am I doing wrong?
Hey guys! I've just discovered "edit" models, and it's. frankly, the HOLY GRAIL of image generation. Its consistency is just mind-blowing. That is, as long as I stick to clothes…
-
Is it possible to make z-image-turbo generate perfect text?
I have tried everything. Asking claude and using tips from the internet, but I was unable to ever produce always 100% perfect text even if I only need short text consisting of a…
-
You can now make Mac generate high quality songs - ported Khala Music Ai to Apple Silicon
Hello all. Couple of days I have posted about my couple weeks' struggle to port Pixal3D over to Apple Silicon so I could generate pretty decent 3D models. Well, after that I put…
-
Flux klein 9b Comic Character Lora?
I like the comic style art that can already be achieved with Flux Klein 9B. So that my described character does not always have different hair, jewelry and a different pattern i…
-
How to make Forge NEO faster?
I have been using old forge and I had to reset my entire PC recently so I am using NEO, thing is it feels slow and sometimes images take forever to load compared to before. Is t…
-
Beginner prompting Guide for LTX 2.3 : tips and tricks
If you’ve been messing around with LTX 2.3 lately, you probably realized pretty quickly that this model is incredibly sensitive to text inputs. If you just throw standard Gen-AI…
-
Should I use ComfyUI or Wan2GP?
I am trying to choose between two models (LTX 2.3 and Wan 2.2) and two tools (ComfyUI and Wan2GP). Here are my specs: GPU: RTX 3090 Ti (24GB VRAM) RAM: 128GB System RAM CPU: Int…
-
Omnimodal AI Movie Studio integrated in Blender
After many months of refactoring, adding a plugin system, render queue, batch generation from bundled img, speech, and txt with LTX 2.3, my omnimodal end-to-end-and-back AI movi…
-
I spent some time researching Leonardo AI vs Midjourney from a free-user perspective — here's what I found
I've been putting together a comparison of AI image generators and ended up digging through a lot of community discussions about Leonardo AI and Midjourney. One thing I specific…
-
If you HAVENT switched from AI Toolkit to One Trainer...
For users of StableDiffusion's AI Toolkit, a crucial update has been overlooked by many: the switch to One Trainer. This change impacts the LORA (Large Open Reading Frames Align…
-
Does comfyui or Ai in general be able to replace terrible frame like this?
This is an old FMV from Chrono trigger and Square wanted to achieve a 30 FPS video by combining the previous frame and the next frame cause this ugly looking frame. Can comfy re…
-
Comfyui v0.23.0 Support NVIDIA PixelDiT and PiD (CORE-201) by @kijai in #14103
Comfyui v0.23.0 Support NVIDIA PixelDiT and PiD (CORE-201) by @kijai in #14103. Review r/StableDiffusion for full context, affected parties, and operational relevance.
-
PixelDiT — 1.3B pixel-space diffusion transformer, no VAE, 4GB VRAM, now 100% diffusers compatible with Qwen encoder support
PixelDiT, a 1.3 billion parameter pixel-space diffusion transformer, has been upgraded to support 100% diffusers compatible with Qwen encoder. This significant update eliminates…