Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language Models

ArXi:2604.02485v1 Announce Type: cross Confirmation bias, the tendency to seek evidence that s rather than challenges one's belief, hinders one's reasoning ability. We examine whether large language models (LLMs) exhibit confirmation bias by adapting the rule-discovery study from human psychology: given a sequence of three numbers (a "triple"), an agent engages in an interactive feedback loop where it (1) proposes a new triple, (2) receives feedback on whether it satisfies the hidden rule, and (3) guesses the rule.