From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models

ArXi:2501.00296v4 Announce Type: replace-cross Our aim is to learn to solve long-horizon decision-making problems in complex robotics domains given low-level skills and a handful of short-horizon nstrations containing sequences of images. To this end, we focus on learning abstract symbolic world models that facilitate zero-shot generalization to novel goals via planning. A critical component of such models is the set of symbolic predicates that define properties of and relationships between objects.