AI RESEARCH

SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications

arXiv CS.CL

ArXi:2510.24793v3 Announce Type: replace We present SwiftEmbed, a production-oriented serving system for static token embeddings that achieves 1.12\,ms p50 latency for single-text requests while maintaining a 60.6 MTEB average score across 8 representative tasks. Built around the open-source Potion-base-8M distilled model from MinishLab and implemented in Rust, the system delivers 50,000 requests per second through static embedding lookup, mean pooling, and zero-copy IEEE754 binary serialization.