Gradient-Free Noise Optimization for Reward Alignment in Generative Models

ArXi:2605.11347v1 Announce Type: cross Existing reward alignment methods for diffusion and flow models rely on multi-step stochastic trajectories, making them difficult to extend to deterministic generators. A natural alternative is noise-space optimization, but existing approaches require backpropagation through the generator and reward pipeline, limiting applicability to differentiable settings.