AI RESEARCH

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

arXiv CS.LG

ArXi:2603.09046v1 Announce Type: cross Device-side Large Language Models (LLMs) have witnessed explosive growth, offering higher privacy and availability compared to cloud-side LLMs. During LLM inference, both model weights and user data are valuable, and attackers may even compromise the OS kernel to steal them. ARM TrustZone is the de facto hardware-based isolation technology on mobile devices, used to protect sensitive applications from a compromised OS. However, protecting LLM inference with TrustZone incurs significant overhead due to its inflexible isolation of memory and the