Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

ArXi:2605.01482v1 Announce Type: new Multi-Hop Fact Verification (MHFV) necessitates complex reasoning across disparate evidence, posing significant challenges for Large Language Models (LLMs) which often suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought (CoT), lack explicit modeling of the causal dependencies between evidence and claims. In this work, we