ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

FeaturedNews

LREC2026 Tutorial: Building Corpora for Inclusive Language Technologies

LREC2026 tutorial on building corpora for low-resource languages, covering data collection, machine translation, and downstream applications.

A tutorial on ‘Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies’ is announced for LREC2026. It targets NLP practitioners, researchers, and developers working with multilingual and low-resource languages.

The tutorial will cover the full lifecycle of NLP technologies development, including data collection, corpus creation, parallel sentence mining, and machine translation.

  • Case studies on 10+ languages from diverse language families
  • Coverage of both digitally resource-rich and underrepresented languages
  • Hands-on methods and applied modeling frameworks

Save the date: Saturday, 16 May 2026, morning session, Room 6. More information: https://tum-nlp.github.io/low-resource-tutorial/

Researchers working on non-mainstream languages are invited to share their experiences via this form.

Tags: LREC2026, NLP, low-resource languages, language technologies, corpus creation, multilingual technologies