AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp Synthesis

ArXi:2603.08021v1 Announce Type: cross Generating human grasping poses that accurately reflect both object geometry and user-specified interaction semantics is essential for natural hand-object interactions in AR/VR and embodied AI. However, existing semantic grasping approaches struggle with the large modality gap between 3D object representations and textual instructions, and often lack explicit spatial or semantic constraints, leading to physically invalid or semantically inconsistent grasps.