🇺🇸 LDC Newsletter April 2026
Location: United States
LDC releases new language packs and corpora for natural language processing research
New publications from LDC include DEFT Chinese and English Light and Rich ERE Parallel Annotation, MATERIAL Tagalog-English Language Pack, and LORELEI Somali Representative Language Pack.
- DEFT Chinese and English Light and Rich ERE Parallel Annotation consists of 179 Chinese discussion forum documents and their English translations annotated for entities, relations, and events.
- MATERIAL Tagalog-English Language Pack contains 100 hours of Tagalog conversational telephone speech, transcripts, English translations, annotations, and queries.
- LORELEI Somali Representative Language Pack contains over 13 million words of Somali monolingual text, with translations and annotations.
2026 members can access these corpora through their LDC accounts. Non-members may license this data for a fee.
Tags: LDC, natural language processing, language packs, Somali language, Tagalog language, Chinese language, entity recognition, relation extraction, event extraction, United States