COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition

ArXi:2503.07259v2 Announce Type: replace-cross The goal of creating intelligent, human-centered wearable systems for continuous activity understanding faces a fundamental trade-off: Egocentric video-based models capture rich semantic information and have nstrated strong performance in human activity recognition (HAR), but their high power consumption, privacy concerns, and dependence on lighting limit their feasibility for continuous on-device recognition.