AI RESEARCH

The Diminishing Returns of Early-Exit Decoding in Modern LLMs

arXiv CS.CL

ArXi:2603.23701v1 Announce Type: new In Large Language Model (LLM) inference, early-exit refers to stopping computation at an intermediate layer once the prediction is sufficiently confident, thereby reducing latency and cost. However, recent LLMs adopt improved pre