AI RESEARCH
The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives
arXiv CS.LG
•
ArXi:2605.11361v1 Announce Type: new Inference-time reward alignment asks how to turn a pre-trained diffusion model with base law $p$ into a sampler that favors a reward $r$ while remaining close to $p$. Since there is no canonical distributional distance for this closeness constraint, different choices lead to different "reward-aligned" laws and, just as importantly, different algorithmic problems.