ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

FeaturedNews

Universal Dependencies v2.15 Release: A Significant Advancement in Cross-Linguistic Treebank Annotation

Explore the latest release of Universal Dependencies, featuring cross-linguistic treebank annotation for 168 languages. Facilitate multilingual parser development and cross-lingual learning with this significant advancement in language typology research.

We are thrilled to announce the release of Universal Dependencies v2.15, a significant advancement in cross-linguistic treebank annotation. This release includes annotated banks for 168 languages from 33 families, with a total of 1,939,085 sentences, 32,078,118 surface tokens, and 32,741,781 syntactic words. The treebanks range in size from less than 1,000 to over 3 million tokens. The release also includes updates to 24 treebanks that have undergone significant changes since the last release.

Universal Dependencies is a project that aims to facilitate multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on Stanford, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets, providing a universal set of categories and guidelines for consistent annotation across languages.

Tags: Universal Dependencies, treebank annotation, multilingual parser development, cross-lingual learning, language typology, Stanford, Google universal part-of-speech tags, Interset interlingua, consistent annotation

Leave a Reply

Your email address will not be published. Required fields are marked *