Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

ArXi:2605.10430v1 Announce Type: cross Estimating heterogeneous treatment effects with machine learning has attracted substantial attention in both academic research and industrial practice. However, the two communities often evaluate models under markedly different conditions. Methodological work typically relies on semi-simulated benchmarks and metrics that require counterfactual outcomes, whereas real-world applications rely on observable metrics based on ranking or test outcomes.