The threat of analytic flexibility in using large language models to simulate human data

ArXi:2509.13397v3 Announce Type: replace-cross Social scientists are now using large language models to create "silicon samples": synthetic datasets intended to stand in for human respondents. However, producing these samples requires many analytic choices, including model selection, sampling parameters, prompt format, and the amount of graphic or contextual information provided. Across two studies, I examine whether these choices materially affect correspondence between silicon samples and human data.