Skip to Main Content

Text and Data Mining

Information on text and data mining resources available through the Library

Using Ebooks and Journals for Text Mining

Many of our publishers allow text mining, but have restrictions on how it can be done. In most cases you need to use an API, rather than web scraping. Contact us if you are interested in a publisher that is not listed here and we will advise you of your options.

Brill

Brill permits text mining on their publications and content that has been purchased by the Library; however it cannot be automatically downloaded. Please contact us with specifics and we will work with Brill to provide the content needed.

Cambridge University Press

Cambridge University Press includes TDM in their terms and conditions

You may download, extract, store and index Content for the purposes of text and data mining ("TDM") for non-commercial research purposes only. Any copies of the Content stored locally by you for the purposes of TDM shall be deleted once such research project ends. 

However, they do not allow use of automated tools and limit downloads to 500 PDF files per hour. Contact Cambridge if you are interested in programmatic access
openresearch@cambridge.org

Constellate (formerly JSTOR Data for Research)

Constellate is a text analytics service from the parent company of JSTOR. The Library subscribes at the Pedagogy level, which allows you to create and save up to five datasets using materials from JSTOR and partner publishers. Constellate does not allow downloading for local analysis, but does provide online tools, including online Jupyter Notebooks with Python. 

Create a JSTOR account to access online tools and save datasets. You can also use the "login with Google" option to use your CNetID.

Elsevier ScienceDirect

Elsevier allows text mining of our subscribed content using an API. Register for API access at their Developer Portal.

Access to sources outside our subscription are considered on a case by case basis, contact them through the policies link.

HathiTrust Research Center

The HathiTrust Research Center (HTRC) enables computational analysis of the HathiTrust corpus. It offers a variety of tools for doing analysis while complying with copyright law. You must create an account at HathiTrust Analytics to use the Research Center, but you can now login with your CNetID. You will receive an email verification to finish account creation.

IEEE

IEEE allows text mining, but you must contact them for permission. 

Their developer portal offers APIs that give access to metadata and abstracts and the full text of open access articles without prior permissions.

JAMA Network

Oxford University Press

Oxford University Press offers text mining their journals for non-commercial use, but does not have details available. Contact them for information at
Data.Mining@oup.com

PLOS

SAGE Publishing

SAGE permits downloading of any articles within the LIbrary's subscription for non-commercial use. They allow web-scraping, but prefer researchers use the Crossref TDM API. You must adhere to the policies listed on their TDM page and accept the terms of their TDM license.

SPIE Digital Library

SPIE allows text and data mining of licensed materials for research and educational or business purposes only. Information derived from text or data mining may be made public; however, the original copyrighted source documents may not be publicly distributed. SPIE does not offer an API or any other service to facilitate text and data mining.

Springer Nature

Springer Nature allows text mining of subscribed content collected from their websites with these conditions

  • Content may be downloaded manually or through automated means, but must be stored on a server only accessible to University of Chicago affiliates.
  • Automated means should not be faster than one request per second.
  • Content must be deleted at the conclusion of the text and data mining project.

Contact us if you have any questions regarding the content that is covered by this agreement or conditions on downloading.

Access Springer Nature through these links, but note that we do not have access to everything that they publish.

University of Chicago Press

The University of Chicago Press allows text mining, but researchers must request permission from the Press if they intend to use software for automated downloading. 

Wiley

Wiley allows text & data mining of their content for non-commercial, scholarly research.
Users must log in and access Wiley content through Wiley's API and must agree to the click-through Wiley Text and Data Mining Agreement.

For instructions on how to text and data mine Wiley, obtain an API key, and review the terms and conditions, please see: