AI RESEARCH
TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward
arXiv CS.CV
•
ArXi:2603.07700v1 Announce Type: new While few-step generative models have enabled powerful image and video generation at significantly lower cost, generic reinforcement learning (RL) paradigms for few-step models remain an unsolved problem. Existing RL approaches for few-step diffusion models strongly rely on back-propagating through differentiable reward models, thereby excluding the majority of important real-world reward signals, e.g., non-differentiable rewards such as humans' binary likeness, object counts, etc.