ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

FeaturedNewsUnited States

🇺🇸 LDC Newsletter April 2026

Location: United States

LDC releases new language packs and corpora for natural language processing research

New publications from LDC include DEFT Chinese and English Light and Rich ERE Parallel Annotation, MATERIAL Tagalog-English Language Pack, and LORELEI Somali Representative Language Pack.

  • DEFT Chinese and English Light and Rich ERE Parallel Annotation consists of 179 Chinese discussion forum documents and their English translations annotated for entities, relations, and events.
  • MATERIAL Tagalog-English Language Pack contains 100 hours of Tagalog conversational telephone speech, transcripts, English translations, annotations, and queries.
  • LORELEI Somali Representative Language Pack contains over 13 million words of Somali monolingual text, with translations and annotations.

2026 members can access these corpora through their LDC accounts. Non-members may license this data for a fee.

Tags: LDC, natural language processing, language packs, Somali language, Tagalog language, Chinese language, entity recognition, relation extraction, event extraction, United States