AI RESEARCH

GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models

arXiv CS.AI • March 11, 2026

ArXi:2603.09079v1 Announce Type: cross VLA models encode visual observations as 2D patch tokens with no intrinsic geometric structure. We