tts

2024-Voicebox: Text-guided multilingual universal speech generation at scale

See [LVS+24].

2024-E2 TTS: Embarrassingly easy fully non-autoregressive zero-shot TTS

See :cte:p:`eskimez2024e2`.

Text to speech

LVS+24

Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, and others. Voicebox: text-guided multilingual universal speech generation at scale. Advances in neural information processing systems, 2024. URL: https://arxiv.org/pdf/2306.15687.