ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

News

LDC July 2025 Newsletter: New Publications and Scholarship

LDC July 2025 newsletter: Fall data scholarship program applications open until September 15, 2025; new publications include AnnoDIFP, Penn Parsed Corpora, and LoReHLT Uzbek Representative Language Pack.

The LDC July 2025 newsletter highlights the Fall 2025 LDC data scholarship program and new publications, including AnnoDIFP Session Audio and Transcripts, Penn Parsed Corpora of Historical English Second Release, and LoReHLT Uzbek Representative Language Pack.

  • Fall 2025 LDC data scholarship program is now accepting applications until September 15, 2025. Eligible students can gain no-cost access to LDC data by submitting a data use proposal and letter of support from their advisor. For more information, visit the LDC Data Scholarships page.
  • AnnoDIFP Session Audio and Transcripts (LDC2025S06) contains 438.34 hours of English audio and transcripts from in-person interviews, supporting algorithm development for predicting personality traits.
  • Penn Parsed Corpora of Historical English Second Release (LDC2025T09) corrects errors and inconsistencies in the previous release, streamlines annotation, and includes updated documentation.
  • LoReHLT Uzbek Representative Language Pack (LDC2025T08) is comprised of approximately 47 million words of Uzbek monolingual text, parallel text, and annotated data, supporting human language technology development.

These resources are available to 2025 members through their LDC accounts or for licensing by non-members.

Tags: LDC, data scholarship, AnnoDIFP, Penn Parsed Corpora, LoReHLT, Uzbek language pack, linguistic resources