Controllable User Simulation

ArXi:2605.11519v1 Announce Type: new Using offline datasets to evaluate conversational agents often fails to cover rare scenarios or to testing new policies. This has motivated the use of controllable user simulators for targeted, counterfactual evaluation, typically implemented by prompting or fine-tuning large language models. In this work, we formalize controllable simulation as a causal inference problem. By bridging natural language evaluation with off-policy evaluation methodology, we show that the standard practice of.