A simple React hook for running local LLMs via WebGPU

Running AI inference natively in the browser is the holy grail for reducing API costs and keeping enterprise data private. But if you’ve actually tried to build it, you know the reality is a massive headache. You have to manually configure WebLLM or Transformers.js, set up dedicated Web Workers so your main React thread doesn't freeze, handle browser caching for massive model files, and write custom state management just to track the loading progress. It is hours of complex, low-level boilerplate before you can even generate a single token.