AI RESEARCH
VL-KnG: Persistent Spatiotemporal Knowledge Graphs from Egocentric Video for Embodied Scene Understanding
arXiv CS.AI
•
ArXi:2510.01483v2 Announce Type: replace-cross Vision-language models (VLMs) nstrate strong image-level scene understanding but often lack persistent memory, explicit spatial representations, and computational efficiency when reasoning over long video sequences. We present VL-KnG, a