Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training

ArXi:2604.23121v1 Announce Type: cross Have you ever post-trained a generalist vision-language-action (VLA) policy on a small nstration dataset, only to find that it stops responding to new instructions and is limited to behaviors observed during post-