AI RESEARCH
Latent Speech-Text Transformer
arXiv CS.AI
•
ArXi:2510.06195v2 Announce Type: replace-cross Auto-regressive speech-text models pre-trained on interleaved text tokens and discretized speech tokens nstrate strong speech understanding and generation, yet remain substantially less compute-efficient than text LLMs, partly due to the much longer sequences of speech tokens relative to text. This modality imbalance disproportionately allocates pre-