ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations

ArXi:2605.07474v1 Announce Type: cross Vision-Language-Action (VLA) models hold great promise for general-purpose robotic intelligence, yet scaling up such models is severely bottlenecked by the high cost of acquiring annotated