AI RESEARCH
Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving
arXiv CS.AI
•
ArXi:2507.18454v2 Announce Type: replace-cross CPUs are critical for LLM serving due to their availability, cost efficiency, and edge applicability. However, efficient CPU serving is hindered by conflicting prefill/decode resource demands under non-disaggregated deployment constraints--existing solutions fail to avoid cross-phase interference, ignore sub-NUMA hardware structures, and deliver suboptimal dynamic-shape kernel performance.