NVIDIA releases Cosmos 3 Omnimodal world modelson HF

r/LocalLLaMA
AI Hardware

Nano: 16B Super: 64B Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory inputs. It serves as a foundational building block for a broad range of Physical AI applications and research spanning world understanding, world generation, simulation, and embodied policy learning. Haven't seen much here yet. Some twitter discussion: submitted by /u/RobotRobotWhatDoUSee [link] [comments.