AI RESEARCH
Vision-TTT: Efficient and Expressive Visual Representation Learning with Test-Time Training
arXiv CS.CV
•
ArXi:2603.00518v2 Announce Type: replace Learning efficient and expressive visual representation has long been the pursuit of computer vision research. While Vision Transformers (ViTs) gradually replace traditional Convolutional Neural Networks (CNNs) as scalable vision learners, their applications are plagued by the quadratic complexity of the self-attention mechanism. To address the challenge, we