Int3DNet: Scene-Motion Cross Attention Network for 3D Intention Prediction in Mixed Reality

ArXi:2603.13355v1 Announce Type: new We propose Int3DNet, a scene-aware network that predicts 3D intention areas directly from scene geometry and head-hand motion cues, enabling robust human intention prediction without explicit object-level perception. In Mixed Reality (MR), intention prediction is critical as it enables the system to anticipate user actions and respond proactively, reducing interaction delays and ensuring seamless user experiences.