Gemma 4 + LiteRT-LM on mobile: much better memory/perf than my llama.cpp setup

Hi r/LocalLLaMA - I've been paying close attention to the edge AI ecosystem because it's an area where i see huge potential and where I truly believe AI will become useful for day to day tasks. Around the gemma 4 release I was already experimenting with local AI but the memory usage i was getting even for smaller variants of Gemma 3 were unacceptable.