Evaluating LLMs as Human Surrogates in Controlled Experiments

ArXi:2604.15329v1 Announce Type: cross Large language models (LLMs) are increasingly used to simulate human responses in behavioral research, yet it remains unclear when LLM-generated data the same experimental inferences as human data. We evaluate this by directly comparing off-the-shelf LLM-generated responses with human responses from a canonical survey experiment on accuracy perception. Each human observation is converted into a structured prompt, and models generate a single 0--10 outcome variable without task-specific.