Efficient Training for Cross-lingual Speech Language Models

ArXi:2604.11096v1 Announce Type: cross Currently, large language models (LLMs) predominantly focus on the text modality. To enable natural human-AI interaction, speech LLMs are emerging, but building effective end-to-end speech LLMs remains challenging due to limited data and the difficulty in expanding to languages. In this paper, we