Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Fold Paralysis

ArXi:2409.03597v4 Announce Type: replace-cross This paper presents the Multimodal Laryngoscopic Video Analyzing System (MLVAS), a novel system that leverages both audio and video data to automatically extract key video segments and metrics from raw laryngeal videostroboscopic videos for assisted clinical assessment. The system integrates video-based glottis detection with an audio keyword spotting method to analyze both video and audio data, identifying patient vocalizations and refining video highlights to ensure optimal inspection of vocal fold movements.