AI RESEARCH

Latent Speech-Text Transformer

arXiv CS.AI

ArXi:2510.06195v2 Announce Type: replace-cross Auto-regressive speech-text models pre-trained on interleaved text tokens and discretized speech tokens nstrate strong speech understanding and generation, yet remain substantially less compute-efficient than text LLMs, partly due to the much longer sequences of speech tokens relative to text. This modality imbalance disproportionately allocates pre-