When are we going to see natively multimodal local text-image models?
r/StableDiffusion
•
Generative AI
Inputs: img/txt, outputs: img/txt. Predictions please. submitted by /u/wojtulace [link] [comments]