AI RESEARCH
Fisher Decorator: Refining Flow Policy via A Local Transport Map
arXiv CS.LG
•
ArXi:2604.17919v1 Announce Type: new Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the $L_2$ regularization as an upper bound of the 2-Wasserstein distance ($W_2$), which can be problematic in offline settings.