Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM

ArXi:2604.18655v1 Announce Type: cross Deploying large language models (LLMs) on smartphones poses significant engineering challenges due to stringent constraints on memory, latency, and runtime flexibility. In this work, we present a hardware-aware framework for efficient on-device inference of a LLaMA-based multilingual foundation model ing multiple use cases on Samsung Galaxy S24 and S25 devices with SM8650 and SM8750 Qualcomm chipsets respectively.