Nemotron-3-Nano (4B), new hybrid Mamba + Attention model from NVIDIA, running locally in your browser on WebGPU.

r/LocalLLaMA
AI Hardware

I haven't seen many people talking about NVIDIA's new Nemotron-3-Nano model, which was released just a couple of days ago. so, I decided to build a WebGPU for it! Everything runs locally in your browser (using Transformers.js). On my M4 Max, I get ~75 tokens per second - not bad! It's a 4B hybrid Mamba + Attention model, designed to be capable of both reasoning and non-reasoning tasks. Link to (+ source code): submitted by /u/xenovatech [link] [comments]