Google unifies text, image, video, and audio in a single vector space with Gemini Embedding 2

The Decoder
Generative AI

Google's first native multimodal embedding model brings text, images, video, audio, and documents into one vector space, cutting out the need for separate models in AI pipelines.