AI RESEARCH
LongAudio-RAG: Event-Grounded Question Answering over Multi-Hour Long Audio
arXiv CS.LG
•
ArXi:2602.14612v3 Announce Type: replace-cross Long-duration audio is increasingly common in industrial and consumer settings, yet reviewing multi-hour recordings is impractical, motivating systems that answer natural-language queries with precise temporal grounding and minimal hallucination. Existing audio-language models show promise, but long-audio question answering remains difficult due to context-length limits. We