AI RESEARCH
Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving
arXiv CS.LG
•
ArXi:2601.21351v2 Announce Type: replace Attentio-FFN disaggregation (AFD) is an emerging architecture for LLM decoding that separates state-heavy, KV-cache-dominated Attention computation from stateless, compute-intensive FFN computation, connected by per-step communication. While AFD enables independent scaling of memory and compute resources, its performance is highly sensitive to the Attention/FFN provisioning ratio: mis-sizing induces step-level blocking and costly device idle time.