AI RESEARCH

VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification

arXiv CS.CV

ArXi:2512.09646v2 Announce Type: replace Synthesizing realistic human-object interactions (HOI) in video is challenging due to the complex, instance-specific interaction dynamics of both humans and objects. Incorporating controllability in video generation further adds to the complexity. Existing controllable video generation approaches face a trade-off: sparse controls like keypoint trajectories are easy to specify but lack instance-awareness, while dense signals such as optical flow, depths or 3D meshes are informative but costly to obtain.