Liquid AI releases LFM2.5-VL-450M - structured visual understanding at 240ms
r/LocalLLaMA
•
Machine Learning
Generative AI
Today, we release LFM2.5-VL-450M our most capable vision-language model for edge deployment. It processes a 512×512 image in 240ms and it is fast enough to reason about every frame in a 4 FPS video stream. It builds on LFM2-VL-450M with three new capabilities: bounding box prediction (81.28 on RefCOCO-M) multilingual visual understanding across 9 languages (MMMB: 54.29 → 68.09), and function calling. Most production vision systems are still multi-stage: a detector, a classifier, heuristic logic on top.