Real-time video captioning in the browser with LFM2-VL on WebGPU

r/LocalLLaMA
Generative AI

The model runs 100% locally in the browser with Transformers.js. Fun fact: I had to slow down frame capturing by 120ms because the model was too fast! Once I figure out a better UX so users can follow the generated captions easily (less jumping), we can remove that delay. Suggestions welcome! Online (+ source code): submitted by /u/xenovatech [link] [comments]