Generative Actor-Critic with Soft Bridge Policies

ArXi:2605.08733v1 Announce Type: new Expressive generative policies such as diffusion and flow models are appealing for MaxEnt online reinforcement learning because of their ability to model multimodal and highly non-Gaussian action distributions. However