UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction

ArXi:2604.19221v1 Announce Type: new Full-duplex speech interaction, as the most natural and intuitive mode of human communication, is driving artificial intelligence toward human-like conversational systems. Traditional cascaded speech processing pipelines suffer from critical limitations, including accumulated latency, information loss, and error propagation across modules. To address these issues, recent efforts focus on the end-to-end audio large language models (LLMs) like GPT-4o, which primarily unify speech understanding and generation task.