AI RESEARCH

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

arXiv CS.LG

ArXi:2604.20500v1 Announce Type: new Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly revisiting the same high-probability prefixes and duplicate completions. We propose Distinct Leaf Enumeration (DLE), a deterministic decoding method that treats truncated sampling as traversal of a pruned decoding tree and systematically enumerates distinct leaves instead of sampling with replacement.