Forget about VAEs? SenseNova's NEO-unify achieves 31.5 PSNR without an encoder – Native Image Gen is coming.

r/StableDiffusion
Generative AI AI Research

Just saw this new technical blog from SenseNova (SenseTime) and it looks like the "Frankenstein" era of sticking different models together might be ending. Instead of the usual CLIP + VAE + Diffusion setup we're used to in Stable Diffusion or FLUX, they’ve built a Native Unified Model called NEO-unify. Why should we care? No VAE/Encoder: It works directly on pixels. If you've ever struggled with VAE artifacts or losing tiny details during encoding, this architecture fixes that at the root. Insane Reconstruction: It hits a 31.56 PSNR on image reconstruction.