Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers

ArXi:2603.27666v1 Announce Type: new Recent advances in diffusion-based controllable visual generation have led to remarkable improvements in image quality. However, these powerful models are typically deployed on cloud servers due to their large computational demands, raising serious concerns about user data privacy. To enable secure and efficient on-device generation, we explore in this paper controllable diffusion models built upon linear attention architectures, which offer superior scalability and efficiency, even on edge devices.