AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation

ArXi:2603.15046v1 Announce Type: cross In this study, we address the problem of language-guided robotic manipulation, where a robot is required to manipulate a wide range of objects based on visual observations and natural language instructions. This task is essential for service robots that operate in human environments, and requires safety, efficiency, and task-level generality.