Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation

ArXi:2603.18428v1 Announce Type: cross Decoding strategies largely determine the quality of Large Language Model (LLM) outputs, yet widely used heuristics such as greedy or fixed temperature/top-p decoding are static and often task-agnostic, leading to suboptimal or inconsistent generation quality across domains that demand stylistic or structural flexibility. We