SenseNova-U1 just dropped — native multimodal gen/understanding in one model, no VAE, no diffusion

What's new: Text rendering in images actually works. Diffusion models scramble text because they don't have a language understanding pathway. U1 does - because it's natively multimodal. Posters with long titles, slides with bullet points, comics with speech bubbles - all clean. Infographics & dense visual output - posters, annotated diagrams, multi-panel layouts. Diffusion models fundamentally struggle with these because they process latents, not semantic content.