Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

ArXi:2509.02510v2 Announce Type: replace-cross Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-\$p\$ (nucleus) sampling, and min-\$p\$ sampling, aim to manage this trade-off.