AI RESEARCH
ZeRO-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
arXiv CS.LG
•
ArXi:2605.02960v1 Announce Type: new Production LLM workloads increasingly serve discriminative tasks, such as classification, recommendation, and verification, whose answers are read from the logits of a single prefill pass with no autoregressive decoding.