SenseNova-U1-8B-MoT (novel open source multimodal understanding + image generation model) seems like a bigger deal architecturally then it’s getting credit for

r/LocalLLaMA
Generative AI AI Research

SenseNova dropped SenseNova-U1 on the last day of April and I’ve only found like one other mostly ignored post on this sub talking about it. It seems like a really exciting novel architecture to me. It appears to be exceptional at text-to-infographics as one of its major high points, as well as being good at image editing, generation, and visual understanding. Supposedly it’s not the traditional mash-up (no VAE) types of multimodal models that we’ve seen before.