Last Week in Multimodal AI - Local Edition
r/LocalLLaMA
•
Computer Vision
Open Source AI
AI Tools
I curate a weekly multimodal AI roundup, here are the local/open-source highlights from the last week: Google Gemma 4 - Open model family for coding and logical reasoning with a massive context window. Runs on a single machine. Post | Models TII Falcon Perception - 0.6B early-fusion VLM with open-vocabulary grounding, segmentation, and OCR. Punches way above its weight. Post | Hugging Face IBM Granite 4.0 3B Vision - Compact document intelligence model for visual reasoning and data extraction.