AI RESEARCH

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models

arXiv CS.AI

ArXi:2603.09079v1 Announce Type: cross VLA models encode visual observations as 2D patch tokens with no intrinsic geometric structure. We