RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment

ArXi:2603.29419v1 Announce Type: cross Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocalize contact points and mispredict post-contact actions when applied to unseen categories, thereby hindering robust generalization. We