Imagen 2 - what architecture is it using?

I know there is like 10 good local image models but to me newest Image model from Openai seem like reall evolution. And so I want to ask does anybody have idea what kind of architecture is it using? Because that image model really do understand spoken language. submitted by /u/Single_Ring4886 [link] [comments]