Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

ArXi:2605.07721v1 Announce Type: cross Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updating internal representations while retaining a standard Key-Value (KV) cache across iterations, causing memory consumption to grow linearly with reasoning depth.