Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

ArXi:2605.13435v1 Announce Type: cross There is growing interest in utilizing flow-based models as decision-making policies in reinforcement learning due to their high expressive capacity. However, effectively leveraging this expressivity for value maximization remains challenging, as naive gradient-based optimization requires backpropagating through numerical solvers and often leads to instability.