Uni-HOI:A Unified framework for Learning the Joint distribution of Text and Human-Object Interaction

ArXi:2604.27491v1 Announce Type: new Modeling 4D human-object interaction (HOI) is a compelling challenge in computer vision and an essential technology powering virtual and mixed-reality applications. While existing works have achieved promising results on specific HOI tasks-such as text-conditioned HOI generation and human motion generation from object motion, they typically rely on task-specific architectures and lack a unified framework capable of handling diverse conditional inputs.