NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise

ArXi:2605.04313v1 Announce Type: cross Causal reasoning in natural language requires identifying relevant variables, understanding their interactions, and reasoning about effects and interventions, often under noisy or ambiguous conditions. While large language models (LLMs) exhibit strong general reasoning abilities, they struggle to disentangle correlation from causation, particularly when observations are partially incorrect or irrelevant information is present. In this work, we