AI RESEARCH
M2H-MX: Multi-Task Dense Visual Perception for Real-Time Monocular Spatial Understanding
arXiv CS.CV
•
ArXi:2603.29236v1 Announce Type: new Monocular cameras are attractive for robotic perception due to their low cost and ease of deployment, yet achieving reliable real-time spatial understanding from a single image stream remains challenging. While recent multi-task dense prediction models have improved per-pixel depth and semantic estimation, translating these advances into stable monocular mapping systems is still non-trivial. This paper presents M2H-MX, a real-time multi-task perception model for monocular spatial understanding.