Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

ArXi:2605.11128v1 Announce Type: new Diversity is essential for language-model applications ranging from creative generation to scientific discovery, yet modern LLMs often collapse into a narrow subset of plausible outputs. While prior work has developed benchmarks for measuring this lack of diversity, less is known about how the step-by-step probability distributions at inference time cause the problem. We