Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation

ArXi:2502.13637v2 Announce Type: replace Human affordance learning investigates contextually relevant novel pose prediction such that the estimated pose represents a valid human action within the scene. While the task is fundamental to machine perception and automated interactive navigation agents, the exponentially large number of probable pose and action variations make the problem challenging and non-trivial. However, the existing datasets and methods for human affordance prediction in 2D scenes are significantly limited in the literature.