SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

ArXi:2510.17516v4 Announce Type: replace-cross Large language model (LLM) simulations of human behavior have the potential to revolutionize the social and behavioral sciences, if and only if they faithfully reflect real human behaviors. Current evaluations of simulation fidelity are fragmented, based on bespoke tasks and metrics, creating a patchwork of incomparable results. To address this, we