Parameter-Efficient Distributional RL via Normalizing Flows and a Geometry-Aware Cram\'er Surrogate

ArXi:2505.04310v2 Announce Type: replace-cross Distributional Reinforcement Learning (DistRL) improves upon expectation-based methods by modeling full return distributions, but standard approaches often remain far from parsimonious. Categorical methods (e.g., C51) rely on fixed s where parameter counts scale linearly with resolution, while quantile methods approximate distributions as discrete mixtures whose piecewise-constant densities can be wasteful when modeling complex multi-modal or heavy-tailed returns. We.