Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

ArXi:2602.13810v2 Announce Type: replace Learning expressive and efficient policy functions is a promising direction in reinforcement learning (RL). While flow-based policies have recently proven effective in modeling complex action distributions with a fast deterministic sampling process, they still face a trade-off between expressiveness and computational burden, which is typically controlled by the number of flow steps.