AI RESEARCH

Mull-Tokens: Modality-Agnostic Latent Thinking

arXiv CS.AI

ArXi:2512.10941v2 Announce Type: replace-cross Reasoning goes beyond language; the real world requires reasoning about space, time, affordances, and much that words alone cannot convey. Existing multimodal models exploring the potential of reasoning with images are brittle and do not scale. They rely on calling specialist tools, costly generation of images, or handcrafted reasoning data to switch between text and image thoughts.