Quantile-Coupled Flow Matching for Distributional Reinforcement Learning

ArXi:2605.08515v1 Announce Type: new Unlike standard expected-return Reinforcement Learning (RL), Distributional RL (DRL) models the full return distribution, making it better-suited for uncertainty-aware and risk-sensitive decision-making. Conditional Flow Matching (CFM) critics have recently attracted attention for modelling continuous, multi-modal return distributions.