Last Week in Multimodal AI - Local Edition

r/LocalLLaMA
Generative AI AI Research

I curate a weekly multimodal AI roundup, here are the local/open-source highlights from last week: FlashMotion - Controllable Video Generation Few-step video gen on Wan2.2-TI2V with multi-object box/mask guidance. 50x speedup over SOTA. Weights available. Project | Weights Foundation 1 - Music Production Model Text-to-sample model built for music workflows. Runs on 7 GB VRAM. Post | Weights GlyphPrinter - Accurate Text Rendering for Image Gen Glyph-accurate multilingual text rendering for text-to-image models. Handles complex Chinese characters. Open weights.