Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling

ArXi:2604.09159v1 Announce Type: new Maximum entropy reinforcement learning (MaxEnt RL) has become a standard framework for sequential decision making, yet its standard Gaussian policy parameterization is inherently unimodal, limiting its ability to model complex multimodal action distributions. This limitation has motivated increasing interest in generative policies based on diffusion and flow matching as expressive alternatives.