Distributional Off-Policy Evaluation with Deep Quantile Process Regression

ArXi:2604.18143v1 Announce Type: cross This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we