New Opportunity in Multilingual Speech Generation with Neural Codecs
A new announcement is out regarding the development of multilingual streaming Text-to-Speech (TTS) systems using neural codecs for Indian languages. The challenge includes TTS data for Indian English, Kannada, Gujarati, and Bhojpuri, with each language having a male and female speaker. This initiative is a part of the SYSPIN project at SPIRE lab, Indian Institute of Science (IISc) Bangalore, India.
This is an excellent opportunity for researchers to contribute to the advancement of real-time, adaptable, and high-quality speech generation systems. Recent developments in Conversational AI models have increased the demand for low-latency, multilingual, and controllable TTS systems. For applications such as Large Language Models (LLMs), streaming TTS systems are essential.
Neural codec-based TTS systems have achieved state-of-the-art performance and offer compact representations of speech that enable efficient transmission and storage. Additionally, various speech attributes can be encoded in neural codecs, allowing for high-quality and controllable speech synthesis.