Hybrid on-device inference on Android: llama.cpp + LiteRT + NPU/GPU routing

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Tools

Hi everyone, I’m the maintainer of Box - a fork of Google’s AI Edge Gallery that I’ve been extending into a fully offline AI assistant for Android. Full disclosure: I built this project. It runs entirely on-device (no cloud, no accounts, no external inference), and combines multiple local inference backends in a single app.