Library Guides: Text and Data Mining: Scholarly Books and Journals

Using Ebooks and Journals for Text Mining

Many of our publishers allow text mining, but have restrictions on how it can be done. In most cases you need to use an API, rather than web scraping. Contact us if you are interested in a publisher that is not listed here and we will advise you of your options.

Brill

Brill permits text mining on their publications and content that has been purchased by the Library; however it cannot be automatically downloaded. Please contact us with specifics and we will work with Brill to provide the content needed.

Cambridge University Press

Cambridge University Press includes TDM in their terms and conditions

You may download, extract, store and index Content for the purposes of text and data mining ("TDM") for non-commercial research purposes only. Any copies of the Content stored locally by you for the purposes of TDM shall be deleted once such research project ends.

However, they do not allow use of automated tools and limit downloads to 500 PDF files per hour. Contact Cambridge if you are interested in programmatic access
openresearch@cambridge.org

Cambridge Core Terms and Condtions

Constellate (formerly JSTOR Data for Research)

Constellate is a text analytics service from the parent company of JSTOR. The Library subscribes at the Pedagogy level, which allows you to create and save up to five datasets using materials from JSTOR and partner publishers. Constellate does not allow downloading for local analysis, but does provide online tools, including online Jupyter Notebooks with Python.

Create a JSTOR account to access online tools and save datasets. You can also use the "login with Google" option to use your CNetID.

Constellate This link opens in a new window
Click on Dataset Builder to begin
Constellate is the text analytics service from the not-for-profit ITHAKA, parent company of JSTOR. It offers online analysis of text drawn form JSTOR and its publishing partners. Login to save up to five datasets and access online analysis tools.

Elsevier ScienceDirect

Elsevier allows text mining of our subscribed content using an API. Register for API access at their Developer Portal.

Access to sources outside our subscription are considered on a case by case basis, contact them through the policies link.

HathiTrust Research Center

The HathiTrust Research Center (HTRC) enables computational analysis of the HathiTrust corpus. It offers a variety of tools for doing analysis while complying with copyright law. You must create an account at HathiTrust Analytics to use the Research Center, but you can now login with your CNetID. You will receive an email verification to finish account creation.

HathiTrust Research Center Home Page
More information about the HTRC
HathiTrust Research Center Analytics
Starting point for using the Research Center

IEEE

IEEE allows text mining, but you must contact them for permission.

Their developer portal offers APIs that give access to metadata and abstracts and the full text of open access articles without prior permissions.

IEEE API Portal

JAMA Network

JAMA Network API
Content from all journals from the American Medical Association in our subscription.

Oxford University Press

Oxford University Press offers text mining their journals for non-commercial use, but does not have details available. Contact them for information at
Data.Mining@oup.com

OUP Text Mining Page

PLOS

PLOS Search API
Query the content of PLOS journals.

SAGE Publishing

SAGE permits downloading of any articles within the LIbrary's subscription for non-commercial use. They allow web-scraping, but prefer researchers use the Crossref TDM API. You must adhere to the policies listed on their TDM page and accept the terms of their TDM license.

SPIE Digital Library

SPIE allows text and data mining of licensed materials for research and educational or business purposes only. Information derived from text or data mining may be made public; however, the original copyrighted source documents may not be publicly distributed. SPIE does not offer an API or any other service to facilitate text and data mining.

Springer Nature

Springer Nature allows text mining of subscribed content collected from their websites with these conditions

Content may be downloaded manually or through automated means, but must be stored on a server only accessible to University of Chicago affiliates.
Automated means should not be faster than one request per second.
Content must be deleted at the conclusion of the text and data mining project.

Contact us if you have any questions regarding the content that is covered by this agreement or conditions on downloading.

Access Springer Nature through these links, but note that we do not have access to everything that they publish.

SpringerLink This link opens in a new window
Allows full-text searching of Springer-Verlag electronic resources, including thousands of electronic books in a wide variety of subjects. Includes most books published from 2005 and titles in book series from 1997. This database includes books, book chapters, and journal articles; search results can be limited by type of resource, author, subject, language, and publication date.

University of Chicago Press

The University of Chicago Press allows text mining, but researchers must request permission from the Press if they intend to use software for automated downloading.

University of Chicago Press Terms & Conditions
More information at the Press web site

Wiley

Wiley allows text & data mining of their content for non-commercial, scholarly research.
Users must log in and access Wiley content through Wiley's API and must agree to the click-through Wiley Text and Data Mining Agreement.

For instructions on how to text and data mine Wiley, obtain and API key, and review the terms and conditions, please see:

Wiley Text and Data Mining Policies and Procedures