[R] Dynin-Omni: masked diffusion-based omnimodal foundation model

https://dynin.ai/omni/ We introduce Dynin-Omni , a first masked diffusion-based omnimodal foundation model that unifies text, image, video, and speech understanding and generation, achieving strong cross-modal performance within a single architecture. -- Interesting approach.. what do you think? I am personally skeptical of the benefit of unifying all modalities into single weight, but an unique approach indeed. submitted by /u/marcusaureliusN [link] [comments]