AI RESEARCH

End-to-End Spatial-Temporal Transformer for Real-time 4D HOI Reconstruction

arXiv CS.CV

ArXi:2603.14435v1 Announce Type: new Monocular 4D human-object interaction (HOI) reconstruction - recovering a moving human and a manipulated object from a single RGB video - remains challenging due to depth ambiguity and frequent occlusions. Existing methods often rely on multi-stage pipelines or iterative optimization, leading to high inference latency, failing to meet real-time requirements, and susceptibility to error accumulation.