Compared to What? Baselines and Metrics for Counterfactual Prompting

ArXi:2605.01048v1 Announce Type: cross Counterfactual prompting (i.e., perturbing a single factor and measuring output change) is widely used to evaluate things like LLM bias and CoT faithfulness. But in this work we argue that observed effects cannot be attributed to the targeted factor without accounting for baseline ``meaning-preserving'' modifications to text that establish general model sensitivity.