AI RESEARCH

ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference

arXiv CS.CV

ArXi:2507.10800v3 Announce Type: replace ViTs deliver SOTA performance, yet their fixed computational budget prevents scalable deployment across heterogeneous hardware. Recent Matryoshka-style Transformer architectures mitigate this by embedding nested subnetworks within a single model to enable scalable inference. However, these models allocate the same amount of compute to all inputs, regardless of their complexity, which leads to inefficiencies. To address this, we