ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

News

LDC Newsletter: New Publications and Data Licensing

LDC releases new publications and updates on data licensing, including the 2015 NIST Language Recognition Evaluation Test Set and the Xi’an Multi-Language Learner Corpus.

The Linguistic Data Consortium (LDC) has released its March 2025 newsletter, featuring new publications and updates on data licensing.

New publications include the 2015 NIST Language Recognition Evaluation Test Set and the Xi’an Multi-Language Learner Corpus.

The 2015 NIST Language Recognition Evaluation Test Set contains approximately 867 hours of conversational telephone speech and broadcast narrowband speech in 20 languages. The Xi’an Multi-Language Learner Corpus comprises 526 argumentative essays in 15 languages by Chinese L1 university students studying second languages.

  • 2025 members can access these corpora through their LDC accounts.
  • Non-members may license this data for a fee.

Tags: Linguistic Data Consortium, LDC, language recognition, evaluation test set, multi-language learner corpus, second language research, cross-linguistic comparison