AI RESEARCH
Perceive What Matters: Relevance-Driven Scheduling for Multimodal Streaming Perception
arXiv CS.CV
•
ArXi:2603.13176v1 Announce Type: new In modern human-robot collaboration (HRC) applications, multiple perception modules jointly extract visual, auditory, and contextual cues to achieve comprehensive scene understanding, enabling the robot to provide appropriate assistance to human agents intelligently. While executing multiple perception modules on a frame-by-frame basis enhances perception quality in offline settings, it inevitably accumulates latency, leading to a substantial decline in system performance in streaming perception scenarios.