ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

FeaturedNews

Universal Dependencies v2.14: A Major Release of Annotated Treebanks

We are thrilled to announce the release of Universal Dependencies v2.14, a comprehensive collection of annotated treebanks for 161 languages. This release includes 283 treebanks, representing 31 language families, and ranging in size from less than 1,000 tokens to over 3 million tokens.

The new release contains 1,906,050 sentences, 31,541,523 surface tokens, and 32,179,731 syntactic words. It also includes significant changes to 39 treebanks, such as Abkhaz AbNC, Dutch LassySmall, and English GUM.

The Universal Dependencies project aims to facilitate multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets.

The release was made possible by the contributions of 457 researchers and developers. We expect the next release to be available in November 2024.

For more information, please visit https://universaldependencies.org/.

Leave a Reply

Your email address will not be published. Required fields are marked *