Running a local LLM on Android with Termux and llama.cpp

What I used Samsung S21 Ultra Termux llama-cpp-cli llama-cpp-server Qwen3.5-0.8B with Q5_K_M quantization from huggingface (I also tried Bonsai-8B-GGUF-1bit from huggingface. Although this is a newer model and required a different setup, which I might write about at a later time, it produced 2-3 TPS and I did not find that to be usable) Installation I downloaded the "Termux" app from the Google Play and installed the needed tools in Termux: pkg update && pkg upgrade -y pkg install llama-cpp -y Downloading a model I downloaded Qwen3.5-0.8B-Q5_K_M.gguf in my browser and saved it to my device.