ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

News

Opera Latina Adnotata (v0.2.0) Released

Opera Latina Adnotata (v0.2.0) released, a multilayer Latin corpus with 736 texts and 17M+ tokens, searchable by various criteria.

A new version of Opera Latina Adnotata, a multilayer Latin corpus, has been released. The corpus consists of 736 texts and 17M+ tokens, searchable by various criteria.

  • word form
  • lemma
  • morphology (POS and morphological features)
  • syntax (dependency syntax following the AGDT annotation scheme)
  • CTS URN for work, author, and edition
  • CTS structure (e.g., ‘book,’ ‘section,’ etc.)
  • author name
  • work title
  • (experimental) IPA transcription of word forms (the ‘Classical Latin’ one)

The data is hosted on Zenodo and can be queried online through ANNIS. More information can be found in the associated repository on GitHub.

Tags: Opera Latina Adnotata, Latin corpus, multilayer corpus, linguistics, natural language processing, corpus linguistics, Latin language, text analysis