Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Text and Data Mining

Information on text and data mining resources available through the Library

Text Mining Newspapers

Text mining of newspapers is now available through ProQuest's TDM Studio online service. This offers access to almost all sources in Global Newsstream and ProQuest Historical Newspapers. Our other news sources either don't allow text mining or have conditions on access. 

Learn more about TDM Studio on our guide.

Available XML Files from ProQuest

The Library has purchased XML files for the following newspapers from ProQuest. These files can be downloaded for local analysis.
These years are all that are available to us at this time due to publisher restrictions. Use the links to the ProQuest dataset from the Library Catalog in the links below to download the files.

Available XML Files from East View

The Library has purchased XML files for these Russian language newspapers. Use the links to the East View dataset from the Library Catalog in the links below to download the files.

Library of Congress Newspapers

The Library of Congress provides API access to full text newspapers in the Chronicling America collection. Chronicling America has digitized over 3,000 newspapers covering the years 1789-1963

NewsBank Newspapers

NewsBank offers text mining access for a fee using a "walled garden", meaning analysis is performed on their servers. This fee must be paid for each project and requires submission of a project plan to NewsBank. Contact us for more information.