UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking

ArXi:2512.09327v2 Announce Type: replace Generating lifelike conversational avatars requires modeling not just isolated speakers, but the dynamic, reciprocal interaction of speaking and listening. However, modeling the listener is exceptionally challenging: direct audio-driven