Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning

ArXi:2605.11289v1 Announce Type: new Average-reward reinforcement learning requires estimating the gain and the bias, which is defined only up to an additive constant. This makes direct distributional analogues ill-posed on the real line. We