Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

ArXi:2605.08626v1 Announce Type: cross Large language models (LLMs) are transforming society, powering applications from smartassistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraints, or sustained high-volume inference. On-device deployment is in turn constrained by limited computation and memory. No single endpoint can deliver high-quality service across this spectrum.