LREC2026 Tutorial: Building Corpora for Inclusive Language Technologies
LREC2026 tutorial on building corpora for low-resource languages, covering data collection, machine translation, and downstream applications.
A tutorial on ‘Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies’ is announced for LREC2026. It targets NLP practitioners, researchers, and developers working with multilingual and low-resource languages.
The tutorial will cover the full lifecycle of NLP technologies development, including data collection, corpus creation, parallel sentence mining, and machine translation.
- Case studies on 10+ languages from diverse language families
- Coverage of both digitally resource-rich and underrepresented languages
- Hands-on methods and applied modeling frameworks
Save the date: Saturday, 16 May 2026, morning session, Room 6. More information: https://tum-nlp.github.io/low-resource-tutorial/
Researchers working on non-mainstream languages are invited to share their experiences via this form.
Tags: LREC2026, NLP, low-resource languages, language technologies, corpus creation, multilingual technologies