MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation

ArXi:2604.06966v1 Announce Type: new Reinforcement learning (RL) has been successfully applied to autoregressive (AR) and diffusion models. However, extending RL to hybrid AR-diffusion frameworks remains challenging due to interleaved inference and noisy log-probability estimation. In this work, we study masked autoregressive models (MAR) and show that the diffusion head plays a critical role in