AI RESEARCH
SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning
arXiv CS.CV
•
ArXi:2603.11563v1 Announce Type: new Embodied task planning demands vision-language models to generate action sequences that are both visually grounded and causally coherent over time. However, existing