AI RESEARCH

Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

arXiv CS.LG

ArXi:2603.09221v1 Announce Type: new Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time