Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models

ArXi:2602.20501v2 Announce Type: replace What does it mean for a visual system to truly understand affordance? We argue that this understanding hinges on two complementary capacities: geometric perception, which identifies the structural parts of objects that enable interaction, and interaction perception, which models how an agent's actions engage with those parts. To test this hypothesis, we conduct a systematic probing of Visual Foundation Models (VFMs