Last Week in Multimodal AI - Local Edition

r/LocalLLaMA
Computer Vision Open Source AI AI Tools

I curate a weekly multimodal AI roundup, here are the local/open-source highlights from the last week: Google Gemma 4 - Open model family for coding and logical reasoning with a massive context window. Runs on a single machine. Post | Models TII Falcon Perception - 0.6B early-fusion VLM with open-vocabulary grounding, segmentation, and OCR. Punches way above its weight. Post | Hugging Face IBM Granite 4.0 3B Vision - Compact document intelligence model for visual reasoning and data extraction.