MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

ArXi:2604.12928v1 Announce Type: new Speech-to-speech language models have recently emerged to enhance the naturalness of conversational AI. In particular, full-duplex models are distinguished by their real-time interactivity, including handling of pauses, interruptions, and backchannels. However, improving their factuality remains an open challenge. While scaling the model size could address this gap, it would make real-time inference prohibitively expensive.