Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing

ArXi:2507.11780v2 Announce Type: replace-cross Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable.