Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

ArXi:2603.07887v1 Announce Type: new Inference-time methods that aggregate and prune multiple samples have emerged as a powerful paradigm for steering large language models, yet we lack any principled understanding of their accuracy-cost tradeoffs. In this paper, we