AI RESEARCH
Patient-Level Multimodal Question Answering from Multi-Site Auscultation Recordings
arXiv CS.AI
•
ArXi:2603.13362v1 Announce Type: cross Auscultation is a vital diagnostic tool, yet its utility is often limited by subjective interpretation. While general-purpose Audio-Language Models (ALMs) excel in general domains, they struggle with the nuances of physiological signals. We propose a framework that aligns multi-site auscultation recordings directly with a frozen Large Language Model (LLM) embedding space via gated cross-attention. By leveraging the LLM's latent world knowledge, our approach moves beyond isolated classification toward holistic, patient-level assessment.