[R] We tested whether LLMs apply the same evidential standard to positive vs. null results: They don’t.

We ran matched-pair experiments across GPT-4o, GPT-5.2 Thinking, and Claude Haiku 4.5. Each experiment presented two versions of an identical fictional study: one reporting a statistically significant positive result, one reporting a null result. Evidence quality, sample size, and methodology were held constant. Only the