AI RESEARCH

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

arXiv CS.AI

ArXi:2603.19223v1 Announce Type: cross We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60M publicly available high-quality data samples, F2LLM-v2 s than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding