HiDream-O1-Image - A pixel space model , no need for VAE, , 8B parameters.
r/StableDiffusion
•
Generative AI
NLP
AI Research
Model HiDream-O1-Image for 50 steps HiDream-O1-Image-De for 28 steps HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders, which natively encodes raw pixels, text, and task-specific conditions in a single shared token space - ing text-to-image, image editing, and subject-driven personalization at up to 2,048 × 2,048. Key Features Pixel-Level Unified Transformer - One end-to-end model on raw pixels, no VAE, no disjoint text encoder.