Flow Matching for Offline Reinforcement Learning with Discrete Actions

ArXi:2602.06138v2 Announce Type: replace Generative policies based on diffusion models and flow matching have shown strong promise for offline reinforcement learning (RL), but their applicability remains largely confined to continuous action spaces. To address a broader range of offline RL settings, we extend flow matching to a general framework that s discrete action spaces with multiple objectives. Specifically, we replace continuous flows with continuous-time Marko chains, trained using a Q-weighted flow matching objective.