AI RESEARCH
SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data
arXiv CS.LG
•
ArXi:2605.01060v1 Announce Type: cross We present SURGE, a streaming GPU encoding system deployed in production to generate embeddings for over 800M texts across 40,000 logical partitions. Production embedding pipelines face a tension between logical data partitioning and efficient GPU utilization: processing each partition independently incurs $P$ inter-process communication (IPC) calls whose overhead limits throughput for compute-light models.