AI RESEARCH

LAP: A Language-Aware Planning Model For Procedure Planning In Instructional Videos

arXiv CS.CV

ArXi:2603.09743v1 Announce Type: new Procedure planning requires a model to predict a sequence of actions that transform a start visual observation into a goal in instructional videos. While most existing methods rely primarily on visual observations as input, they often struggle with the inherent ambiguity where different actions can appear visually similar. In this work, we argue that language descriptions offer a distinctive representation in the latent space for procedure planning. We.