ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

NewsUnited States

🇺🇸 GUM Corpus V12.0.0 Released with New Documents and Annotations

Location: United States

GUM Corpus V12.0.0 released with new documents and annotations, featuring 291,056 tokens and reworked bridging anaphora scheme.

The Georgetown University Multilayer corpus (GUM) has released its 12th version, featuring new documents and annotations.

Key updates include:

  • 291,056 tokens
  • Reworked GUMBridge annotation scheme for bridging anaphora
  • Manual re-annotation of the entire corpus
  • 11 subtypes of bridging anaphora

GUM is an open-source corpus of richly annotated English texts from 24 genres, available under Creative Commons licenses.

For more information, visit the corpus website.

Tags: GUM Corpus, Georgetown University, Corpus Linguistics, Natural Language Processing, Computational Linguistics, NLP, Corpora, United States