What impedes apps using AI to make the user’s device the server running a local LLM?

I was using Gemma’s models in the plane offline while reading a book, to aid my studying, and I thought about this. Think about it this way. Most phones can run Gemma’s 2B model (many the 4B too), and open source models will get cheaper and better, and CPUs optimized for AI. Gemma 4B is almost on par with GPT 4o which was the model at its time. I think in the future, it will work this way: Client: Request -> Local LLM using as server -> Server: Response -> view (client) No compute costs. I don’t know why there isn’t anybody doing anything with models like that.