AI RESEARCH
When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment
arXiv CS.AI
•
ArXi:2605.06723v1 Announce Type: new Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilization}. For a model state and specified answer verbalizers, we project the model's own continuation probabilities onto a finite answer set; in binary tasks this yields an exact log-odds code, $\delta(\xi)=S_\theta(\mathrm{yes}\mid\xi)-S_\theta(\mathrm{no}\mid\xi.