ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

News

New Language Resources: Comprehensive Arabic Phonetic Database and EthioSpeech Corpora

ELRA announces two new language resources: Comprehensive Arabic Phonetic Database and EthioSpeech Corpora, offering phonemic and phonetic transcriptions and recorded speech in six Ethiopian languages.

ELRA is pleased to announce the addition of two new language resources to its catalogue: the Comprehensive Arabic Phonetic Database and the EthioSpeech Corpora.

The Comprehensive Arabic Phonetic Database is a robust and detailed linguistic resource offering both phonemic and phonetic transcriptions, precisely reflecting how Modern Standard Arabic words are realized in actual speech. It covers over 329,000 entries, including general vocabulary, Arab personal names, foreign personal names in Arabic, and worldwide place names.

The EthioSpeech Corpora is comprised of over 391 hours of recorded read speech in six different Ethiopian languages by approximately 200 speakers per language. The dominating domain is media, but texts from different domains were used for some languages.

For more information on these resources or to enquire about having your resources distributed by ELRA, please contact us.

Tags: language resources, Arabic phonetic database, EthioSpeech Corpora, linguistic resources, speech corpora, language learning, natural language processing