Skip to Main Content

Text and Data Mining

Information on text and data mining resources available through the Library

JSTOR Text Analysis Support (formerly Constellate)

JSTOR Text Analysis Support allows you to download metadata for the entire JSTOR database and filter it to identify full text content. You can  submit a request for up to 1.5 million documents for offline analysis. This can be retained by you until your project is complete, when it must be deleted. Full text extracts can be used for text analysis but not for training large language models (LLMs).

Requests that include copyrighted material are reviewed by JSTOR staff. Not all requests will be approved.

Create a JSTOR account to continue. You can also use the login with Google or Microsoft options to use your CNetID.

HathiTrust Research Center

The HathiTrust Research Center (HTRC) enables computational analysis of the HathiTrust corpus. It offers a variety of tools for doing analysis while complying with copyright law. You must create an account at HathiTrust Analytics to use the Research Center, but you can now login with your CNetID. You will receive an email verification to finish account creation.

HTRC will sunset at the end of 2026. HathiTrust is investigating other ways to support this research.