AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

ArXi:2511.18960v3 Announce Type: replace Vision-Language-Action (VLA) models have shown remarkable progress in embodied tasks recently, but most methods process visual observations independently at each timestep. This history-agnostic design treats robot manipulation as a Marko Decision Process, even though real-world robotic control is inherently partially observable and requires reasoning over past interactions.