Causal Bandit Over Unknown Graphs: Upper Confidence Bounds With Backdoor Adjustment

ArXi:2502.02020v3 Announce Type: replace The causal bandit problem seeks to identify, through sequential experimentation, an intervention that maximizes the expected reward in a causal system modeled by a directed acyclic graph (DAG). Existing methods typically assume that the causal graph is known or impose restrictive structural assumptions. In this paper, we study causal bandit problems when the causal graph is unknown. We first consider Gaussian DAG models without latent confounders.