LDC June 2025 Newsletter: New Publications & Data Licensing
LDC June 2025 newsletter: New publications, data licensing info, and commercial technology development guidelines.
The Linguistic Data Consortium (LDC) has released its June 2025 newsletter, highlighting new publications and important information on data licensing for commercial technology development.
New publications include:
- Chinese Sentence Pattern Structure Treebank, developed at Beijing Normal University and Peking University, containing 5,016 sentences and 119,627 tokens syntactically annotated.
- IWSLT 2022-2023 Shared Task Training, Development and Test Set, containing 210 hours of Tunisian Arabic conversational telephone speech, transcripts, English translations, and speaker metadata.
- KAIROS Schema Learning Complex Event Annotation, developed to support the DARPA KAIROS program, containing English and Spanish text, audio, video, and image data labeled for 93 real-world complex events.
LDC reminds for-profit organizations that a membership is required to obtain a commercial license to most LDC databases. Users should consult corpus-specific license agreements for limitations on data use.
Tags: LDC, Linguistic Data Consortium, data licensing, natural language processing, NLP, corpus, dataset