Latent Action Control for Reasoning-Guided Unified Image Generation

ArXi:2605.16961v1 Announce Type: new Unified multimodal models can encode visual understanding and image generation within a shared backbone, yet understanding does not automatically translate into control: models may infer objects, relations, or knowledge cues but fail to instantiate them in the generated image. We propose Latent Action Control (LAC), which makes reasoning actionable by representing it as hidden continuous actions inside a unified generator.