AI RESEARCH

AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation

arXiv CS.CV

ArXi:2602.04672v2 Announce Type: replace Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neural rendering often yields fragmented, non-simulation-ready geometries under heavy occlusion, and (2) dependence on brittle Structure-from-Motion (SfM) initialization leads to frequent failures on in-the-wild footage. To overcome these limitations, we