AI RESEARCH
GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models
arXiv CS.AI
•
ArXi:2603.09079v1 Announce Type: cross VLA models encode visual observations as 2D patch tokens with no intrinsic geometric structure. We