Skip to Main Content

Text and Data Mining

Information on text and data mining resources available through the Library

Linguistic Data Consortium

The University of Chicago has subscribed to the Linguistic Data Consortium since 2001, and therefore, authorized UC users have access to all of the corpora that LDC has produced from 2001-present.  In addition, we have separately acquired a small number of LDC corpora from 1992-2000. If you are in need of corpora from these early years which we lack, please contact the Linguistics Bibliographer.

Note that some corpora are available as CDs, DVDs or on USB and can be accessed by individual title through the Library Catalog. Many corpora are available online and another subset of corpora for which there is both physical and online access. Many of these are available for download. To determine which corpora are available for downloading:

  • Register with LDC as an authorized University of Chicago user (see direction below) and wait for email confirmation
  • On the LDC web site, click the MEMBERS tab
  • Click the INTRANET link and log-in
  • Click on CORPORA AVAILABLE FOR DOWNLOAD link to determine which are available to University of Chicago users

Linguistic Corpora