VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing

ArXi:2510.05213v2 Announce Type: replace-cross Pretrained vision foundation models (VFMs) advance robotic learning via rich visual representations, yet individual VFMs typically excel only in specific domains, limiting generality across tasks. Distilling multiple VFMs into a unified representation for policy can mitigate this limitation but often yields inflexible task-specific feature selection and requires costly full re-