AI RESEARCH

ZeRO-Prefill: Zero Redundancy Overheads in MoE Prefill Serving

arXiv CS.LG

ArXi:2605.02960v1 Announce Type: new Production LLM workloads increasingly serve discriminative tasks, such as classification, recommendation, and verification, whose answers are read from the logits of a single prefill pass with no autoregressive decoding.