Liquid AI's LFM2-24B-A2B running at ~50 tokens/second in a web browser on WebGPU

r/LocalLLaMA
Generative AI

The model (MoE w/ 24B total & 2B active params) runs at ~50 tokens per second on my M4 Max, and the 8B A1B variant runs at over 100 tokens per second on the same hardware. (+ source code): Optimized ONNX models: - - submitted by /u/xenovatech [link] [comments]